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DESCRIPTION 



MUTATIONS IN THE V**™* SUSCEPT1B H ITY RF N FS HE PATOCVTF NUCLEAR FACTOR (HNF) 

1 AJ PHA (nl. HWF -1pAWO HNF-4ct 

BACKGROUND OF THE INVENTION 

1. Field of the Invention 

The present invention relates generally to the fields diabetes. More particularly, it concerns the 
identification of genes responsible for diabetes for use in diagnostics and therapeutics. 

2. Description of Related Art 

Diabetes is a major cause of health difficulties in the United States. Non-insulin-dependent 
diabetes mellitus (NIDDM also referred to as Type 2 diabetes) is a major public health disorder of glucose 
homeostasis affecting about 5% of the general population in the United States. The causes of the fasting 
hyperglycemia and/or glucose intolerance associated with this form of diabetes are not well understood. 

Clinically. NIDDM is a heterogeneous disorder characterized by chronic hyperglycemia leading to 
progressive micro- and macrovascular lesions in the cardiovascular, renal and visual systems as well as 
diabetic neuropathy. For these reasons, the disease may be associated with early morbidity and 
mortality. 

Subtypes of the NIDDM can be identified based at least to some degree on the time of onset of 
the symptoms. The principal type of NIDDM has on-set in mid-life or later. Early-onset NIDDM or 
maturity:onset diabetes of the young (MODY) shares many features with the more common form(s) of 
NIDDM whose onset occurs in mid-life. Maturity-onset diabetes of the young (MODY) is a form of 
non-insulin dependent (Type 2) diabetes mellitus (NIDDM) that is characterized by an early age at onset, 
usually before 25 years of age. and an autosomal dominant mode of inheritance (Fajans 1989). Except 
for these features, the clinical characteristics of patients with MODY are similar to those with the more 

common late-onset form(s) of NIDDM. 

Although most forms of NIDDM do not exhibit simple Mendelian inheritance, the contribution of 
heredity to the development of NIDDM has been recognized for many years (Cammidge 1928) and the 
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high ftp. * ooncordanc. of MIODM in mm*** twin pairs (Ban*,, „„. tn „ jndjMtes t|ia , 
genetic factors play an important role in its development. 

MODY is characterized by its .arty ag. of onset which is during childhood, adolescence or yoong 
adoltnaod and before the a* ., 25 yoars. „ has a clear mode of inheritance being antes™, 
■kunmant. Farther characteristics include high pittance ,.f ,ho symptemology.. and auadabikty of 
mulngenerahonal padigr.es for genetic studies of NIDDM. MODY occois worldwide aod has been found 
to be a phenotypically and genetically heterogeneous disorder. 

A number of genetically «„„ fomB of M00y hm ^ ^ ^ ^ ^ 

h,h, linker between MODy and DNA markers on chromosome 20. this being ,ho location of the MODy, 
gene fB.ll a, ,99,; Co, « a/.. ,9921. MOOY2 is associated with mutatis in the glucokinase gone 
(3CKI located on chromosome 7 < Fro ooel «,/. ,992 and ,993). Recent linkage studies have show™ the 
existence o, a further fom, o, MODY which has boon ..imod M0DY3 (Vasillair. „*/,,995) M0DY3 has 
boon shown ,o bo linked ,o chromosome ,2 and is focalized ,o a 5 cM region ba,„oan markers D, 2SB6 
and D,2S007/0,2S820 of the chromosome (Menrel et al., ,995). 

AHhough i, is well ostablishod ,hat M00Y2 is associated with mutations in OCK there is stil „„ 
.nf.uo.tion as ,o the identic of otha, MODY gen.,. There is a Car naad ,„ identify ,hosa ganes and tha 
mutatians tha, rasul, in diseasad states. The idanhficata, 0 , these ganas and their products win 
facditate a batter understanding of the dis MS ad states associated with mutations in thasa ganes and has 
important implications in the diagnosis and therapy of MODY. 

Since an understanding of tha molecular basis of diabetes in general and MODY specifically may 
facilitate the deealopment ^ ttmmutic ,„ ^ ^ ^ ^ ^ ^ ^ 

needed to identify diatates-suscuptihiHty ganas associated with MODY. Memo,.,, methods of datoaing 
md,wd„,l s »,h , propensity ,„ aeve)op Sllch diseaMS „ ^ ^ ^ ^ 

mechanism underpinning th. genetic lasinn should be dotennined in order to allow diagnosis and 
specifically-directed therapy 

SUMMARY OF THE iMVPWTinw 

The present invention relates to the inventors discovery that the M0DY3 locus the HNFIa gene 
the MODY1 locus is the HNF4a gene and the M0DY4 locus is HNF10. The invention further relates to 
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the discovery that analysis of mutations in the HNF la, HNF1P and HNF4a genes can be diagnostic for 
diabetes. The invention also contemplates methods of treating diabetes in view of the fact that 
mutations m HNFIa, HNF1 p and HNF4a can cause diabetes. 

In one embodiment, the invention contemplates methods for screening for diabetes mellitus. 
These methods comprise: obtaining sample nucleic acid from an animal; and analyzing the nucleic acids to 
detect a mutation in an HNF-encoding nucleic segment; wherein a mutation in the HNF-encoding nucle,c 
acid is indicative of a propensity for non-insulin dependent diabetes. 

in certain embodiments the HNF-encoding nucleic acid is an HNFIa-encoding nucleic acid. In 
view of the inventor's discovery that the M0DY3 locus is HNFIa. a mutation in the HNFIa-encoding 
nucleic acid is indicative of a propensity for diabetes. In some presently preferred embodiments, the 
HNFIa-encoding nucleic acid is located on human chromosome 12q, which is the location site of the 
M0DY3 locus, in other embodiments, the HNF-encoding nucleic acid is an HNF4a-encoding nucleic acid. 
,„ view of the inventor's discovery that the MODYl locus is HNF4a, a mutation in the HNF4a-encod,ng 
nucleic acid is indicative of a propensity for diabetes. In some presently preferred embodiments, the 
HNF4a-encoding nucleic acid is located on human chromosome 20. which is the location of the MODYl 
locus. 

,t is important to note that the terms NIDDM, MODY. MODYl, M0DY3, and M0DY4 are used to 
designate diabetes disease states, and the use of a particular such name may not always represent the 
same causation of that disease state. The inventors have discovered that mutations in HNF4a can lead 
to a M0DY1 disease state; however, not all mutations in HNF4a that lead to diabetes might cause a 
"MODYl" disease state. Conversely, not all diabetes disease states brought about by a mutation .n 
HNF4a might be considered a MODYl disease state. Therefore. Applicants prefer to use, in some cases. 
"HNF4a-diabetes" to note any diabetic disease state brought on by a mutation or malfunction of HNF4a. 
even those that do not exhibit all, or any. MODYl disease states. Likewise, Applicants may use "HNF4a- 
diabetes" and "HNF4p diabetes" rather than "M0DY3" and "M0DY4", respectively. 

The nucleic acid to be analyzed can be either RNA or DNA. The nucleic acid can be analyzed in a 
whole tissue mount, a homogenate. or, preferably, isolated from tissue to be analyzed. In some preferred 
embodiments, the step of analyzing the HNF-encoding nucleic acid comprises sequencing of the HNF- 
encoding nucleic acid to obtain a sequence, the sequence may then be compared to a native nucle.c acd 
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sequence of HNF to determine a mutation. Such a native nucleic acid sequence of HNFla may have the 
sequence set forth in SEQ 10 NO: 1. Such a native nucleic acid sequence of HNF4a has a sequence set 
forth in SEQ ID N0:78. 

The method allows for the diagnosis of almost any mutation, including, for example, point 
mutations, translocation mutations, deletion mutations, and insertion mutations. The method of analysis 
may comprise PCR, an RNase protection assay, an RFLP procedure, etc. Using this method, the inventors 
have diagnosed a variety of HNFla mutations, including those set forth in Table 8. In preferred 
embodiments mutations occur at codons 17,7,27,55/56,98, 131, 122, 142, 129, 131 159 171 229 
241, 272, 288, 289, 291, 292, 273, 379, 401, 443, 447, 459. 487. 515, 519. 547. 548 or 62 0 'of an 
HNFIa-encoding nucleic acid nucleic acid, for example, having the sequence of SEQ ID N0:1. In other 
preferred embodiments a mutation occurs at the splice acceptor region of intron 5 and exon 6 of an 
HNFIa-encoding nucleic acid. In other embodiments a mutation occurs at the splice acceptor region of 
intron 9 of an HNFIa-encoding nucleic acid. In other embodiments, the mutation occurs independently, in 
intron 1, intron 2, intron 5, intron 7 or intron 9 of HNFla gene. The inventors have also found a variety 
of HNF4a mutations, including those found in Table 10. In some preferred embodiments, the HNF- 
encoding nucleic acid is an HNF4a-encoding nucleic acid and a mutation occurs in exon 7 of the HNF4a- 
encoding nucleic acid. In other preferred embodiments, a mutation occurs at codon 268, 127, 130 or 154 
of an HNF4a-encoding nucleic acid having the sequence of SEQ 10 N0:78. 

The invention also contemplates methods of treating diabetes in an animal comprising: 
diagnosing an animal that has diabetes and modulating HNF function in the animal. 

The step of diagnosing an animal with diabetes frequently comprises analysis of an HNFIa- 
encoding nucleic acid sequence or an HNF4a-encoding nucleic acid sequence for a mutation. 

The step of modulating HNF function may comprise providing an HNFla or HNF4a polypeptide to 
the animal. In cases where normal HNFla or HNF4a function is sought to be revived, the HNFla or 
HNF4a polypeptide may be a native HNFla or HNF4a polypeptide. For example, a native HNFla 
polypeptide may the sequence of SEQ ID NO: 2. A native HNF4a polypeptide may the sequence of SEQ 
ID NO: 79. The provision of an HNFla or HNF4a polypeptide is accomplished by any of a number of 
ways. For example, expression of an HNFla or HNF4a polypeptide may be induced, with the expression 
being of an HNFla or HNF4a polypeptide encoded in the animal's genome or of an HNFla or HI\IF4a 
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polypeptide encoded by a nucleic acid provided to the animal. The provision of an HNFIa or HNF4a 
polypeptide may be accomplished by a method comprising introduction of an HNFIa or HNF4a-encoding 
nucleic acid to the animal, for example, by injecting the HNFIa or HNF4a-encoding nucleic acid into the 
animal. 

5 Modulating HNF function in the animal can comprise providing a modulator of HNFIa or HNF4a 

function to the animal. Such modulators are in the nature of drugs and can be, for example HNF4. HNF6, 
HNF3 or any other peptide or molecule that regulates HNFIa. These modulators may be formulated into 
a pharmaceutical compound for delivery to the animal. The modulator of HNFIa, HNFp or HNF4a 
function may be an agonist or antagonist of HNFIa, HNFp or HNF4a. The modulator may modulate 
10 transcription of an HNFIa, HNFp or HNF4a-encoding nucleic acid, translation of an HNFIa, HNFp or 
HNF4a-encoding nucleic acid, or the functioning of the HNF1 a, HNFp or HNF4a polypeptide. 

The invention also contemplates methods of screening for modulators of HNF function 
comprising: obtaining an HNF polypeptide, for example an HNFIa, HNFp or HNF4a polypeptide; 
determining a standard activity of the HNF; contacting the polypeptide with a putative modulator; and 
15 assaying for a change in the standard activity of the polypeptide. In some preferred methods, the 
standard activity profile of a HNFIa polypeptide is determined by measuring the binding of the HNFIa 
polypeptide to a nucleic acid segment comprising the sequence of SEQ ID NO: 9. To facilitate measuring 
the HNFIa activity, the nucleic acid segment comprising the sequence of SEQ ID NO: 9 or the HNFIa 
polypeptide may comprise a detectable label. In some preferred methods, the standard activity profile of 
20 a HNF4a polypeptide is determined by measuring the binding of the HNF4a polypeptide to a nucleic acid 
segment comprising the sequence of SEQ ID NO: 85. To facilitate measuring the HNF4a activity, the 
nucleic acid segment comprising the sequence of SEQ ID NO: 85 or the HNF4a polypeptide may comprise 
a detectable label. In other embodiments, the standard activity profile of an HNF polypeptide is 
determined by determining the ability of an HNFIa polypeptide to stimulate transcription of a reporter 
25 gene, the reporter gene operafrvely positioned under control of a nucleic acid segment comprising the 
sequence of SEQ ID NO: 1. In other embodiments, the standard activity profile of an HNF polypeptide is 
determined by determining the ability of an HNF4a polypeptide to stimulate transcription of a reporter 
gene, the reporter gene operatively positioned under control of a nucleic acid segment comprising the 
sequence of SEQ ID NO: 78. Similar assays are contemplated for HNF1 p polypeptide. 
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The invention also contemplates methods of screening for modulators of HNF polypeptide 
function comprising: obtaining an HNF la, HNFip or HNF4a-encoding nucleic acid segment; determining 
a standard transcription and translation activity of the HNF la, HNFip or HNF4a.encoding nucleic acid 
sequence; contacting the HNFIa or HNF4a-encoding nucleic acid segment with a putative modulator- 
maintaining the nucleic acid segment and putative modulator under conditions that normally allow for 
HNFIa or HNF4a transcription and translation; and assaying for a change in the transcription and 
translation activity. 

The inventors discovery allows for the preparation of a host of HNF modulators such as 
MODY3/HNF1amodulators, M0DY4/HNFip-modulators and M0DY1/HNF4a modulators. Such 
modulators themselves are within the scope of the invention. Such an HNF modulator may be prepared or 
preparable by a process comprising screening for modulators of HNF function comprising: obtaining an 
HNF polypeptide; determining a standard activity profile of the HNF polypeptide; contacting the HNF 
polypeptide with a putative modulator; and assaying for a change in the standard activity profile. An HNF 
modulator prepared by a process comprising screening for modulators of HNF function comprising: 
obtaining an HNF-encoding nucleic acid segment; determining a standard transcription and translation 
activity of the HNF-nucleic acid sequence; contacting the HNF-encoding nucleic acid segment with a 
putative modulator; maintaining the nucleic acid segment and putative modulator under conditions that 
normally allow for HNF transcription and translation; and assaying for a change in the transcription and 
translation activity. 

Some aspects of the invention relate to isolated and purified polynucleotides encoding an HNF 
polypeptide. Such polynucleotides can be: an HNF1a-encoding nucleic acid, HNF1 P-encoding nucleic 
acid sequence, or an HNF4a-encoding nucleic acid. In some particular embodiments, the polynucleotide 
encodes an HNFIa having an amino acid sequence as set forth in SEO ID N0:127. In preferred 
embodiments, the polynucleotide may be an HNFIa-encoding nucleic acid sequence has a sequence of 
SEQ ID N0:126. In additional particular embodiments, the polynucleotide encodes an HNFip having an 
amino acid sequence as set forth in SEQ ID N0:139. In preferred embodiments, the polynucleotide may 
be an HNFip-encoding nucleic acid sequence having a sequence of SEQ ID N0:128. The polynucleotide 
may encode an HNF4a having an amino acid sequence as set forth in SEQ ID N0:140. In preferred 
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FIG. 2. Average glucose (A), insulin (B) and insulin secretion rate (ISR) (C) profiles in 7 diabetic 
M0DY3 subjects ID), 6 nondiabetic M0DY3 subjects W and 6 control subjects (o), during the stepped 
glucose infusion studies. After a 30 min period of baseline sampling, glucose was infused at rates of 1, 
2, 3. 4, 6, and 8 mg -kg 1 -min \ Each infusion rate was administered for a period of 40 min and glucose, 
insulin and C-peptide were measured at 10, 20, 30 and 40 min into each period. 

FIG. 3. Relationship between average plasma glucose concentrations and ISR's during the 
stepped glucose infusion studies in 7 diabetic M0DY3 subjects <□), 6 nondiabetic M0DY3 subjects W 
and 6 control subjects |o). The lowest glucose levels and ISR's were measured under basal conditions, 
and subsequent levels were obtained during glucose infusion rates of 1, 2, 3, 4, 6 and 8 mg kg 1 -min ', 
respectively. 

FIG. 4. Graded intravenous glucose infusions were administered to 6 controls (A), 6 nondiabetic 
M0DY3 subjects (B) and 7 diabetic M0DY3 subjects (C) after an overnight fast (baseline H) and after a 
42-h intravenous infusion of glucose (postglucose (□)) at a rate of 4-6 mg kg 1 min \ 

FIG. 5A, FIG. 5B. FIG. 5C. FIG. 5D. FIG. 5E. FIG. 5F and FIG 5G. M00Y3 pedigrees showing 
co-segregation of mutant HNF1a allele with diabetes mellitus. Males are noted by square symbols and 
females by circles. Individuals with NIDDM are noted by black symbols and those with gestational-onset 
diabetes or impaired glucose tolerance by shaded symbols. A diagonal line through the symbol indicates 
that the individual is deceased. 

The individual ID is noted at the top right corner of each symbol and the HNF1a genotype, if 
determined, noted below: N, normal allele; M. mutant allele. The arrow indicates the individual from each 
pedigree who was screened for mutations. Note that some individuals have inherited the mutant allele but 
do not yet have NIDDM, usually because of their young age (e.g. P pedigree, individual IV-6; and Ber 
pedigree, individual V-2. Also, some individuals have NIDDM even though they did not inherit the mutant 
HNFIa allele segregating in that family (e.g. Ber pedigree, individual 11-2). Such heterogeneity has been 
noted previously (Bell eta/. 1991) and is a reflection of the high prevalence of NIDDM. 
FIG. 6. The involvement of hepatocyte nuclear factors in diabetes. 

FIG. 7. An alignment of the HNF4a protein sequence from humans (h) with sequences from 
human, mouse (m) , Xenopus (x) and Drosophila (d) species. The putative DNA binding sites are underlined 
and the putative ligand binding sites are in bold. 
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embodiments, the polynucleotide may be an HNF4a-encoding nucleic acid sequence has a sequence of 
SEQIDN0:13Q. 

Other embodiments comprise isolated and purified nucleic acid segments comprising 10, 14, 15, 
25 30 35, 40, 45. 50, 55, 60, 70, 80, 90, 100. 125, 150, 175, 200, 250, 300, 350, 400, 450, or 500 
contiguous nucleic acids identical to the sequence of SEQ ID N0:128 or SEQ ID NO: 126 or the 
complement of these sequences. These nucleic acid segments can be used by those of skill in the art as 
hybridization probes, PCR primers, for the expression of HNF polypeptides, for the expression of other 
polypeptides, etc. In some embodiments, the segment encodes a full-length HNF polypeptide. Of 
particular interest are the promoters for HNF la and HNF10, which are disclosed in SEQ ID NOS: 126 and 
128 respectively and in FIGs. 26 and 27. respectively and discussed elsewhere in this application. These 
promoters may be used by those of skill in the art in many varying applications. 

BRIEF DESCRIPT '"" nc THF DRAWINGS 

The following drawings form part of the present specification and are included to further 
demonstrate certain aspects of the present invention. The invention may be better understood by 
reference to one or more of these drawings in combination with the detailed description of specmc 

embodiments presented herein. 

FIG. 1. Pedigrees of M0DY3 families. The individuals studied in the Clinical Research Center at 
the University of Chicago are indicated by MD-1-5 and 813 and those with NIDDM, IGT and NGT are 
shown by black symbols, shaded symbols and open symbols, respectively. The asterisks indicate that 
these individuals have inherited the at-risk haplotype associated with M0DY3 in that family. The 
genotypes and haplotypes for the P family have been described (Menzel et aL. 1 995) and the pairwise led 
score between MODY and the D12S76ID12S321 haplotype in this family is 2.06 at a recombination 
fraction of 0.00. The pairwise lod score between MODY and D12S76 in pedigree F549 is 0.65 at a 
recombination fraction of 0.00 (Vaxillaire et a/.,1995). The pedigrees BDA1 and BDA12 have not been 
previously described. MODY co-segregates with markers tightly linked to MODY3 in these families with 
pairwise lod scores between MODY and D12S86 of 1.94 and 0.60, respectively, at a recombination 
fraction of 0.00. 



PCTAJS97/16037 

WO 98/11254 

9 

FIG. 8A. FIG. 8B. FIG. 8C. FIG. 8D FIG. BE. FIG. 8F. FIG. BG. FIG. 81. FIG. 8H. FIG. 81. The 

DNA sequences for exon 1, exon 2. exon 3, exon 4, exon 5 exon 6 exon 7 exon 8 exon 9 and exon 10 of 
HNF4a. 

FIG. 9. Physical map of the M0DY3 region of chromosome 1 2. Y AC, BAC (b) and PAC (p> clones are 
5 represented as lines, the length of which reflects the number of included STSs and not the actual size. The 
physical distance between adjacent STSs has not been determined directly and STSs for which the order 
has not been unambiguously determined are indicated in brackets. A circle indicates that the clone was 
positive for the indicated STS and a square indicates a STS derived from the end of that specific clone. 
Several YACs contain large internal deletions which are noted by brackets. The STSs are from GDB™ and 
10 the GenBank STS databases. 

FIG. 10. Partial sequence of exon 4 of the HNMa gene of individual EA1 {Edinburgh pedigree). The 
sequences of the normal and mutant alleles are shown. There is an insertion of a C in codon 291 footed by 
the arrowhead) in the mutant allele resulting in a frameshif t and premature termination. 
FIG. 11. The cDNA sequence of HNFIcc denoting position of the exons. 
15 FIG. 12. Model of the human HNF-4ct showing the different patterns of alternative splicing and 

structures of the different forms of HNF-4a that can be generated by alternative splicing. The amino 
acids that define the boundaries of some of the regions of the protein are shown. DBD and LBD 
correspond to the DNA and ligand-binding domains of HNF-4ct, respectively. 

FIG. 13. Comparison of the sequences of the promoter regions of the human and mouse HNF-4a 
20 genes (SEQ ID N0:135 and SEO ID N0:137, respectively). Identical residues are shown in boxes. The 
binding sites for transcription factors that may regulate the expression of HNF4a are overlined. The 
asterisk notes the predicted transcriptional start site based on the study of the mouse HNF-4a gene 
(Zhong et a/., 1 994). The minimal promoter region required for high-level expression of the mouse gene in 
hepatoma cells is shown by shading. The ATG codon which defines the start of translation is noted. The 
25 arrowhead shows the DNA polymorphism found in the promoter region of the proband of family J2-96. 
The GenBank accession nos. for the mouse promoter sequence are S74519 and S77762. 

FIG. 14A and FIG. 14B. Partial sequence of exon 4 of HNF4a gene of patient J2-21. The 
sequences of the normal (FIG. 14A SEO ID N0:141 and corresponding amino acids SEQ ID N0:142) and 
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mutant (FIG. 14* SEQ ID N0:143) alleles are shown and the arrow indicates the C^T substitution at 
codon 127. 

FIG. 15. Pedigrees of Japanese families with mutations/polymorphisms in the HNF-4a gene. 
Individuals with diabetes are noted by filled symbols and nondiabetic (or not tested) individuals are 
indicated by open symbols. The arrow indicates the proband. The clinical features of each subject are 
shown including age at diagnosis, present age and present treatment. The HNF4a genotype of tested 
individuals is noted: N normal and M-mutation/polymorphism. 

FIG. 16. Identification of a nonsense mutation in the HNF4ct gene in a german family, the 
Oresden-11 pedigree. The members of this family with MODY and impaired glucose tolerance are 
indicated with black and shaded symbols, respectively. The age at diagnosis of diabetes mellitus. present 
age and therapy (OHA, oral hypoglycemic agents), and nature of complications (M, nonvascular disease; 
R, retinopathy; and N, peripheral polyneuropathy) are indicated. The hapiotype associated with MODY in 
this family is shown. 

FIG. 17. Partial sequence of exon 4 of the HNF4a gene of subject 11-4 of the Dresden-11 
pedigree. The R154X mutation is indicated (SEQ ID N0.144 and SEQ ID N0:145). Intron 4 follows the 
Gin codon, CAG. 

FIG. 18A. FIG. 18B. FIG. 18C and FIG. 18D. Oral glucose tolerance testing in the Dresden-1 1 
family. The blood glucose (FIG. ISA), insulin (FIG. 18B), C-peptide (FIG. 18C) and proinsulin (FIG. 18D) 
levels during the course of the glucose tolerance test are shown. The open symbols are the means±SEM 
for subjects with the R1 54X mutation, including those with diabetes and impaired glucose tolerance, and 
the filled symbols are the means for the two normal subjects. 

FIG. 19A. FIG. 19B, FIG. 19C and FIG. 19D. Effect of bolus and infusion of arginine, of 
glucose, and of arginine during hyperglycemic clamp on plasma concentration of glucose (FIG. 19A), 
insulin (FIG. 19B). C-peptide (FIG. 19C), and glucagon (FIG. 19D) in 3 groups of subjects of the RW 



FIG. 2BA and FIG. 20B. Acute insulin (FIG. 20A) and C-peptide (FIG. 20B) response to bolus 
administration of arginine in 3 groups of subjects of the RW pedigree at baseline and during the 
hyperglycemic clamp procedure. The slope of the line connecting these insulin responses (slope of 
potentiation) was lower in ND[+J vs. ND[-J, p < 0.001 . The slope for D|+J was lowest. 
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FIG. 21. MODY pedigree, Italy 1. Subjects with MODY and impaired glucose tolerance are 
indicated by filled and cross-hatched symbols, respectively. Nondiabetic subjects (by testing or history) 
are indicated by open symbols. The clinical features of the subjects are noted below the symbol including 
current treatment: insulin or oral hypoglycemic agent (OHA). The haplotype at the markers D12S321- 
5 D12S76-UC-39 is shown and the at-risk haplotype is noted by shading. The HNF-la genotype is shown: 
N, normal; M, mutant (A-*C substitution at nucleotide -58). Although treated insulin, subject 111-9 fasting 
C-peptide value of 1.2 ng/mi indicating that she has MODY rather than insulin-dependent diabetes 
mellitus. 

FIG. 22. Comparison of the sequence of the promoter region of the human, rat, mouse, chicken 
10 and frog HNF-la a genes (SEQ ID N0:134; SEQ ID N0:138; SEQ ID N0:136; SEQ ID N0:132; SEQ ID 
N0:133 respectively). The A-»C substitution at nucleotide -58 and HNF-4a binding site are shown. 
Residues identical to the human sequence are boxed. Nucleotides are numbered relative to the 
transcriptional start site of the human gene (indicated by an asterisk). The boxed ATG triplet is the 
initiating methionine. The dashes indicate gaps introduced in the sequences to generate this alignment. 

FIG. 23. Summary of mutations in the human HNF-la gene. This cartoon shows the exons and 
promoter region as boxes. The mutations and amino acid polymorphisms are from Yamagata et al., 1996; 
LehtoWUr*/.. 1997; KaisakiPW., 1997; Vaxillaireera/., 1997; Fraylinger^., 1997; Hansen T,^./.. 
1997; Urhammerer aL 1997; Glucksmann et al., 1997. The amino acid polymorphisms are I/L27, A/V98 
and S/N487. The single-letter abbreviations for the amino acids are used. 
20 FIG. 24 Partial sequence of exon 2 of HNF-ip gene of subject J2-20 (SEQ ID N0:146 and SEQ 

ID N0:147). The C-*T mutation in codon 177 is indicated. 

FIG. 25. J2-20 pedigree. Individuals with diabetes mellitus are noted by filled symbols. The 
arrow indicates the proband. The present age, age at diagnosis, current treatment and complications are 
shown. The HNF-ip genotype is noted: N, normal; M. mutant. OHA. oral hypoglycemic agent; PDR. 
25 proliferative diabetic retinopathy; CRF. chronic renal failure; and DKA, diabetic ketoacidosis. 

FIG. 26A-FIG. 2BM Partial sequence of human HNF1a gene. SEQ ID N0:126 and SEQ ID 
N0:127 These figures depict a contiguous sequence and have been split into panels due to the size of the 
sequence. The nucleotide and predicted amino acid sequences are shown. Exon and intron sequences are 
in uppercase and lower cases respectively. The approximate size of the gaps in the introns. the complete 
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sequence o, which was „„, dettrmined are „„,„,. ,„ mmta re9 . n ^ ^ fw 

trenscriprion factors the, may reBul8le axpressjon of this _ OT Mcatei) ^ ^ . dMtifjed 
Dnese fo^rin*, in tofes . those i6miM „ y ^ ^ fc ^ ^ ^ ^ 
promoters reBiM is a^n in 00ldfaca , ype The polymorl , llisms „„„ mtatUm ||)e |§ifta 

MM .o «... are ehown in boMfc™ type with the desionatio „ „ mMm ^ ^ ^ 
notes the predicted transcrjpljonal 5tart ^ oased „„ sfcidies of ra( HNp)a ^ ^ tener n ^ 

that the sequence was ambiguous at this site 

FIG. 27A-FIO. 271 Partial sequence of humen HNFlp gene. SEQ ID N0 : 128. SED ID N0129 
end SED 10 ND:139 These figume .epic, a contiguous „ mi ha>a bea „ spat ,„,„ ^ ^ „ 
the arte of ,h. sequence. The nucleotide and predicted amino acid sequences ere ah.™,. Exon and httron 
sequences are in uppercase and lower cases respectively. The approve siz . of the gaps in the 
ratrons. th. compete sequence of which was no, detemaned are noted. ,n the pronator ^ potential 
brndmg sites for transcnption factors tha, moy regulete expression of this gone e„ indicated, with sites 
rdontrfreu by Dnaa. fo.tprinting i„ McS . those identified h, sequence homology in „ omlal , ypa . 

FIG. 28A-FIG. 28V Partial sequence ef human HNF4ct gene. SED ID N0.130. SED ID N0I31 
and SED ,D NO.-140 These depict a contiguous sequence and ha,e boa, split inn, parate due t0 te , M 
of the sequence. The nudootide end plated amino acid sequence, am shewn. Exon and intran 
sequences are in uppercase and lower cases respectively. 

DESCBIPTlim tip y r ^P TBATIvF c f npnn | M [ : r « ? 

The present imrentien cancerns the early detectian. diagnosis, pragnasis end tmatmen, of diabetes 
The present invention describes for ,he firs, time mutations respensdrla for HNFIa, HNFlp end HNF4a* 
related drebetee. The specific mutttien end identity „, the Mrraspondin( ri , t)pt ge „ es ^ ^ 

sublets, ere disclosed. These mutations ar. indicators of HNFIa, HNFtp end HNF4a minted diabetes ami 
am diagnostic „, ,„ e potantia) , or the ieve , opmem of ^ „ „ m ^ ^ ^ 

msclosod herein will also he used.o identify other gene mutations msponsiblefo, other forms of diabetes 

Those sMIod in the or. „i» realiM , tal tha „ ucleic acia ^ ^ ^ ^ _ 

-anery af applications in diebotes detection, diagnosis, pragnasis and treatment. Examples af such 
apphcatrons within the scop, of the present invention include af ^ „, mn 



WO 98/11254 PCH7US97/16037 

13 

specific primers; detection of markers of HNFIa, HNF1p and HNF4a by hybridization with oligonucleotide 
probes; incorporation of isolated nucleic acids into vectors and expression of vector-incorporated nucleic 
acids as RNA and protein; development of immunologic reagents corresponding to gene encoded products; 
and therapeutic treatment for the identified MODY using these reagents as well as, anti-sense nucleic acids, 
5 or other inhibitors specific for the identified MODY. The present invention further discloses screening assays 
for compounds to upregulate gene expression or to combat the effects of the mutant HNFIa. HNFip and 
HNF4a genes. 

A. DIABETES AND MODY 

Diabetes mellitus affects approximately 5% of the population of the United States and over 100 
10 million people worldwide (King eta/., 1988, Harris etai. 1992). A better way of identifying the populace 
who are at risk of developing diabetes is needed as a subject may have normal plasma glucose 
compositions but may be at risk of developing overt diabetes. These issues could be resolved if it were 
possible to diagnose susceptible people before the onset of overt diabetes. This is presently not possible 
with subjects having classical diabetes due to its multifactorial nature. 
1 5 MODY is a monogenic form of diabetes and thus the genes responsible can be more easily studied 

than those whose mutation contributes to the development of polygenic form(s) of this disorder such as 
type 1 and type 2 diabetes mellitus. Recent studies have shown that subjects with maturity onset 
diabetes of the young (MODY), a subset of diabetes characterized by diabetes in the first or second 
decade of life and autosomal dominant inheritance have shown that MODY may result from mutations in 
20 genes on chromosome 20 (HNF4a/M0DY1), chromosome 7 (glucokinase|MODY2) chromosome 12 
(HNF1<x|M0DY3) and chromosoem 17 (HNF1p|M0DY4). 

The clinical characteristics that manifest in HNF4a. HNFIa and HNFip type diabetes resemble 
those seen in patients with type 2 diabetes. These characteristics include frequent severe fasting 
hyperglycemia, the need for oral hypoglycemic agents, eventual insulin requirements, and vascular and 
25 neuropathic complications (Fajans et a/., 1 994; Menzel et a/., 1 995). 

The inventors have shown that prediabetic subjects with mutations in the HNFIa and HNF4a 
genes have subtle but important alterations in the normal pattern of glucose-stimulated insulin secretion. 
Compared to control subjects with no family history of diabetes, they had normal insulin secretion rates 
at lower glucose concentrations. However the increase in insulin secretion rate resulting from an increase 
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in the plasma glucose concentration above 8 mM was less in prediabetic HNF la-mutation subjects than 
controls (see FIG. 2- FIG. 4). 

Exposure of the normal p-cell to increased plasma glucose concentrations for 42-hours results in 
an increase in p-cell responsiveness to a subsequent glucose stimulus. Following a 42-hr glucose infusion 
which raised the plasma glucose concentration to an average value of 7. 1 ± 1 .4 mM, the insulin secretion 
rate of prediabetic HNFIa-mutation subjects increased by 35% between 5 9 mM glucose with a resultant 
shift in the dose-response curve to the left. Five out of six prediabetic HNFIa-mutation subjects showed 
this increase in insulin secretion rate, and only one subject MD13 failed to demonstrate this effect. The 
magnitude of this priming effect of glucose was similar to that seen in the controls. 

Diabetic HNFIa-mutation subjects demonstrated diminished insulin secretion across the entire 
range of glucose concentrations studied. Thus, over the concentration range between 5 and 9 mM 
glucose, the diabetic subjects secreted 50% less insulin than the controls and 51% less than the 
prediabetic HNFIa-mutation subjects. Furthermore, the priming effect of glucose was lost in the 
subjects with overt diabetes. 

Evaluation of insulin resistance indicated that HNFIa-mutation subjects were no more resistant 
than the controls. In fact, there was a tendency towards a lesser degree of insulin resistance in the 
HNFIa-mutation subjects, making it highly unlikely that insulin resistance plays a primary role in the 
pathophysiology of diabetes in these subjects. 

The inventors have recently characterized insulin secretory responses in prediabetic HNF4a and 
HNFIa-mutation subjects. Prediabetic HNF4a and HNFIa-mutation subjects both have reduced insulin 
secretory responses to glucose which are evident only as the plasma glucose rises above a threshold of 7 
or 8 mM, respectively. Whereas in HNFIa-mutation subjects the priming effect of glucose on insulin 
secretion is retained, a low-dose glucose infusion did not have any significant effects on insulin secretion 
in prediabetic HNF4a-mutation subjects (Byrne et a/., 1995b). In subjects with mutations in the 
glucokinase gene, the dose-response curve is shifted to the right and ISR is markedly decreased at 
glucose concentrations below 7 mM, but insulin secretion continues to increase with increasing plasma 
glucose concentrations even above levels of 8 mM. The priming effect of glucose on insulin secretion also 
is preserved (Byme eta/., 1994). The inventors have recently performed similar studies in subjects with 
classical Type 2 and impaired glucose tolerance. In subjects with IGT, although the dose-response curve 
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relating glucose and insulin secretion was shifted to the right, the priming effect of glucose on insulin 
secretion was retained. In subjects with overt Type 2 diabetes, the increase in insulin secretion m 
response to an increase in glucose was markedly reduced and the priming effect of glucose on insulin 
secretion was lost. 

It thus appears that p cell dysfunction plays an important, pathophysiologic role in the 
development of the three forms of MOOY which have been characterized to date. A clear prediabetic 
phase has not been identified in subjects with glucokinase mutations. However, profound defects in the 
ability of the (i-cell to respond to a glucose stimulus is present even in the face of the mild elevations in 
glucose which characterizes the majority of these subjects. By contrast, a prediabetic phase is a feature 
of the HNF4a and HNFIa forms of diabetes. These prediabetic subjects have reduced insulin secretory 
responses to elevated concentrations of glucose induced by the step-wise glucose infusion prior to onset 
of diabetes. Prediabetic HNF4a and HNFIa subjects can be distinguished based on the effects of a low 
dose glucose infusion on insulin secretion. The priming effect of glucose on insulin secretion is retained in 
HNFIa subjects in the prediabetic phase but is lost after the onset of overt hyperglycemia whereas th.s 
priming effect is absent in HNF4a diabetes even in the prediabetic phase of the disease. The severe 
reductions in insulin secretory responses to glucose seen in the overtly diabetic HNFIa subjects are likely 
to be due in part to the effects of high glucose, in view of the well documented adverse effects of 
hyperglycemia on insulin secretion. A full understanding of the reasons for these changes in the dose- 
response relationships between glucose and insulin secretion requires a better understanding of the roles 
of HNF4a and HNFIa in regulating normal pancreatic b cell function. 

Further studies by the inventors have shown that elevations in the 2-hr post-challenge blood 
glucose levels predict alterations in insulin secretory responses to glucose. However, in that case, 
subjects with impaired glucose tolerance demonstrated reduced insulin secretory responses over a range 
of glucose concentrations and not just in response to increases in glucose above 8 mM as was seen in the 
prediabetic HNF1a-mutation subjects. Thus, the inventors do not believe that the alterations in insulin 
secretion seen in the prediabetic HNFIa subjects resulted from the modest elevations in glucose. Rather, 
the inventors' results suggest that the percent priming and overall insulin secretion rates deteriorate as 
glucose tolerance deteriorates, and the lack of ability to increase insulin secretion at high glucose levels is 
a feature of the mutation in the HNFIa gene. 
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From the studies described above and in the Examples that follow it is clear that the identification 
and characterization of the gene(s) associated with MODY diabetes is important. Mutations in such 
genes lead to diabetes and it would be diagnostically and therapeutically advantageous to identify the 
mutations in subjects predisposed to such mutations. 

Studies attempting to find the location of the M0DY3 gene showed that the putative gene linked 
to M0DY3 type diabetes was localized to a 5cM interval between the markers D12S86 and 
D12S807/D12S820 IMenze. et aL. 1995). However the identity of the gene has not been elucidated. 
The present invention for the first time shows that the gene linked to M0DY3 expresses a factor 
previously identified from hepatocyte known as hepatocyte nuclear factor 1 a herein referred to as 
HNF la. 

Similarly studies attempting to find the location of the M0DY1 gene showed that the putative 
gene linked to MODY1 type diabetes was localized to a 13 cM interval between the markers D20S169 
and D20S176 (Stoffel et al., 1996). Likewise, as with M0DY3, the identity of the gene in M0DY1 has 
not been elucidated. The present invention for the first time shows that the gene linked to M0DY1 
expresses a factor previously identified from hepatocytes known as hepatocyte nuclear factor 4 a herein 
referred to as HNF4a. 

Subsequently, the inventors performed studies to elucidate the genetic defects responsible for 
other forms of MODY. The present invention for the first time shows that MODY is likely a consequence 
of mutations in hepatocyte nuclear factor 10 herein referred to as HNF 1(3. 

The association of mutation in HNFIa, HNFIp end HNF4a with diabetes indicates the 
importance of the HNF network in controlling pancreatic p-cell function and glucose homeostasis. Hence 
the studies presented here have categorized exemplary mutations in the HNFIa, HNFip and HNF4a genes 
as identified by PGR techniques. These landmark results form the basis of many therapeutic and 
diagnostic techniques as measures to alleviate diabetes, particularly HNF la-diabetes, HNF ip-diabetes 
and HNF 4a diabetes. 

B. HEPATOCYTE NUCLEAR FACTORS ARE THE GENES LINKED TO MODY TYPE DIABETES 
Hepatocyte Nuclear Factor la. 

Hepatic nuclear factor la (also known as APF, LFB1 or HP1) has been described as a sequence 
specific DNA binding protein from rat liver. It is thought to interact with promoter elements present in 
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many genes including albumin, a- and p- fibrinogen, a- 1 -antitrypsin, ^fetoprotein pyruvate kinase, 
transthyretin and aldose B among others. HNFIa has been purified from rat liver extracts by DNA 
affinity chromatography using fibrinogen promoter element (Courtoise. 1987) and was characterized as a 
single 88 kDa protein. It is now known that HNFIa is a transcription factor. 

Mendel and Crabtree (1993) suggested that HNFIa interacted with "hepatocyte-specific" genes 
in which it plays a prominent role in regulation of both in vitro and in vivo transcription. However, it was 
later shown that HNFIa mRNA can also be found in several non hepatocyte tissues including the kidney 
stomach, intestines, thymus and spleen and pancreas (Baumhueter eta/., 1990; Kuo et aL 1990). This 
suggests that HNFIa expression may participate in the differentiation of non-hepatic organs as well as 



Transcription factors are proteins that control transcription by binding to cisacting regulatory 
DNA sequences in a gene. As such, these factors play a crucial role in development and differentiation by 
dictating the pattern of expression of genes within specific cells and tissues. 

The homeodomain proteins are a class of transcription factors. These proteins all possess the 
unusual characteristic of having very similar DNAbinding domains even though they mediate diverse 
effects. HNFIa is an example of a homeodomain protein. HNFIa has been shown to dhnerize with itself 
in solution. It appears that maximal transcriptional activation by HNFIa requires a novel dimerization 
cof actor. This cof actor, known as the dimerization cofactor of HNFIa IDCoH), does not in itself bind 

ONA, rather.it binds HNFIa. 

HNFIa binds to DNA as a dimer; this was confirmed from studies on the purification and cloning 
of HNFIa. Other studies showed that there was a DNA binding protein that binds to the HNFIa binding 
she in cells that lacks the HNFIa mRNA. This second protein HNF1B is a homolog of HNFIa but is the 

product of a separate gene. 

Regulation studies of the HNFIa promoter showed that binding sites for transcription factors 
HNF3, API and HNF4a are essential for the expression of HNFIa (Hansen and Crabtree, 1993). It has 
been demonstrated that HNF4a is located on chromosome 20 of the human genome. The present 
inventors suggest that M0DY1, which is known to be linked to chromosome 20, may act as a regulator of 
NI0DY3 gene expression as such mutations in HNF4a may be responsible for M0DY1 form of diabetes. 
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HNFIa proteins possess three functional regions, namely, the dimerization, activation and DMA- 
binding domains. The dimerization domain is localized to the first 32 amino acids of the HNFIa proteins. 
The DNA-binding domain is a POU-like homeodomain which binds to a 13 bp palindromic ONA sequence in 
the promoters of HNFIa binding proteins (Courtois et a/., 1988; Frain et a/., 1989). The consensus 
sequence for this HNFIa binding site on these genes is: 

GTTAATNATTACC (SEQ ID N0:9) 
Diabetes mellitus alters the transcription of numerous genes in many different tissues. The 
mechanisms underlying these alterations in transcription are largely unknown. One example of altered 
transcription is seen in the reduced transcription of the albumin gene in diabetes (Wanke et a/., 1991). 
Recently, it has been demonstrated that HNFIa protein levels are reduced in diabetes, leading to the 
theory that decreased gene transcription in diabetes is due to decreased levels of HNFIa a factor critical 
for the regulation of hepatic albumin gene expression. This is thought to be the case in other genes that 
posses an HNFIa binding site end are affected by diabetes. Therefore changes in the abundance of 
HNFIa in diabetes appears to affect the expression of genes whose expression is predominantly 
regulated by this factor. 

The expression of the insulin gene in adult mammals is localized to the 3 cells in the pancreatic 
islets. Studies of this gene have defined a small region in the promoter, the FF-minienhancer, capable of 
conferring tissue-specific and glucose responsive transcriptional activity on a heterologous promoter 
(German eta/., 1990). This minienhancer region is composed of two primary regulatory elements the Far 
box and the FLAT element which interact to upregulate transcription. 

Further analysis of the FLAT element showed it to be a cluster of several cis loci that mediate 
discrete positive and negative effects. The positive locus is characterized as FLAT-F and its activity is 
only revealed when there is a mutation in the negative locus FLAT-E. This FLAT-F region is able to 
specifically bind a number of DNA-binding proteins. The sequence of FLAT-F has significant similarity to 
the consensus sequence of HNFIa. This led to studies to determine whether HNFIa itself may play a 
role in the transcriptional regulation of the rat insulin gene. Subsequently, it was shown that HNFIa 
expression is present in the pancreatic p cell derived insulinoma cell line HIT. HNFIa has been shown to 
bind with and transactivate rat insulin gene enhancers that contain an HNFIa site. 
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Hepatocyte Nuclear Factor 4a 

Hepatocyte nuclear factor 4a (HNF4a) is another transcription factor first associated with the 
liver and having limited tissue distribution (Xanthopoulos et aL, 1991; Zhong et at, 1994). HNF4a can 
activate transcription in several non-hepatic cell lines, indicating that no liver-specific modification is 
required for its function (Sladek et aL , 1 990). 

It has been observed that there is an apparent contradiction between the molecular mass of 
HNF4a predicted from the primary sequence {50.6 kDa) (Sladek et aL, 19901 and that determined by gel 
electrophoresis (54 kDa) suggesting that this difference may be due to post-translational modifications). 
Of the many types of post-translational modifications that might regulate gene expression, most attention 
has been focused on phosphorylation, which can influence transcription factor activity in many ways 

(Hunter and Karin, 1992). 

Three main levels of regulation have been described: phosphorylation can affect the DNA-binding 
activity (Boyle et aL, 1991; Segil et aL, 1991; Shuai et aL, 1994), the transcriptional activation potential 
(Yamamoto et aL, 1988; Trautwein et aL, 1993). or the translocation of a transcription factor from the 
cytoplasm into the nucleus (Metz and Ziff, 1991; Kerr et aL, 1991; Schindler et aL, 1992; Shuai et aL. 
1992). These possibilities are by no means mutually exclusive, and in principle phosphorylation can be 
responsible for simultaneous regulation at several distinct levels. With the exception of certain signal 
transduction proteins (Darnell et aL, 1994), all examples of this type of regulation have involved 
phosphorylation at serine or threonine residues. 

It has been demonstrated that the activity of HNF4a is post-translationally regulated by tyrosine 
phosphorylation, providing an example of a non-signal transduction factor modulated by this modification. 
The HNF4a polypeptide (SEQ ID N0:79) contains 12 tyrosine residues scattered throughout the DNA- 
binding, dimerization, and putative ligand-binding domains (Sladek et aL, 1990) which could be potential 
phosphorylation sites. It seems that the tyrosine phosphorylation of HNF4a is required for its DNA- 
binding activity. It has been shown that the transcriptionally active form of HNF4a is localized in specific 
subnuclear domains. This intranuclear distribution depends directly or indirectly on tyrosine 
phosphorylation, suggesting the existence of an additional control mechanism at the level of subnuclear 
targeting playing a role in transcription regulation. 
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Hepatocyte nuclear factor 4« (HNF-4a) is a positive-acting transcription factor which is 
expressed very early in embryo development and is essential to liver development and function (reviewed 
in Sladek, 1993 and Sladek, 1994). Mouse HNF4a mRNA appears in the primary endoderm of implanting 
blastocysts at embryonic day 4.5 and in the liver and gut primordia at day 8.5 (Duncan et a/., 1994), 
while mice deficient in HNF4a do not survive past day 9 postcoitus (Chen eta/., 1994). 

HNF4a has also been proposed to be responsible for the final commitment for cells to 
differentiate into hepatocytes (Nagy eta/., 1994). In adult rodents, HNF4a is located primarily in the 
liver, kidney, and intestine, and in insects HNF4a is found in the equivalent tissues (Sladek eta/., 1990; 
Zhong eta/., 1993). HNF4a is known to activate a wide variety of essential genes, including those 
involved in cholesterol, fatty acid, and glucose metabolism; blood coagulation; detoxification mechanisms; 
hepatitis B virus infections; and liver differentiation (reviewed in Sladek, 1993 and Sladek, 1994). 

HNF4a is a member of the superfamily of ligand dependent transcription factors, which includes 
the steroid hormone receptors, thyroid hormone receptor (TR). vitamin A receptor, and vitamin D receptor 
IVDR), as well as a large number of receptors for which ligands have not yet been identified, the so-called 
orphan receptors (reviewed in Landers and Spelsberg, 1992; O'Malley and Conneely, 1992; Parker, 1993; 
and Tsai and O'Malley, 1994). All receptors are characterized by two conserved domains; the zinc finger 
region, which mediates DNA binding, and a large hydrophobic domain which mediates protein 
dimerization, transactivation, and ligand binding. 

Whether HNF4a responds to a ligand is not known, but it has been shown to activate 
transcription in the absence of an exogenously added ligand (Hall eta/., 1994; Kuo eta/., 1992; Metzger 
et a/., 1993; Mietus et a/., 1992; Reijnen et a/., 1992; Sladek et a/.. 1990). HNF4a is also highly 
conserved with the Drosophila HNF-4, containing 91 % amino acid sequence identity to the rat HNF4ot in 
the DNA binding domain and 68% identity in the large hydrophobic domain (Zhong eta/., 1993). 

The members of the receptor superfamily have been classified in a variety of ways, one of which 
is by their ability to dimerize with themselves and with other members of the superfamily. For example, 
the steroid hormone receptors, glucocorticoid, mineralocorticoid, and progesterone receptors (GR, MR, 
and PR, respectively), all bind DNA and activate transcription as homodimers. They are present in the 
cytoplasm complexed with heat shock proteins (HSP) until the presence of the appropriate ligand disrupts 
the complex, allowing the receptors to translocate to the nucleus (reviewed in Freedman and Luisi, 1993; 



PCT/US97/16037 

WO 98/11254 

21 

O'Malley and Tsai. 1993; and Tsai and p'Malley, 1994). On the other hand, the retinoid acid receptor 
(RAR) and retinoid X receptor (RXR) as well as the VDR, peroxisome proliferator-activated receptor 
(PPAR), and TR, which do not bind HSP and reside primarily in the nucleus, all bind DNA and activate 
transcription not only as homodimers but also as heterodimers (reviewed in Giguere. 1994; Parker, 1993; 
and Stunnenberg, 1993). Several of the nuclear receptors bind DNA very inefficiently, if at all, as 
homodimers (RXRa, RAR, VDR, TR, and PPAR) but bind DNA well as heterodimers (reviewed in Giguere, 
1994 and Stunnenberg. 1993). At least two of the receptors (RAR and TR) form heterodimers in solution 
with RXRa (Hermann eta/., 1992; Kurokawaef aL. 1993; Zhang eta/., 1992). 

The most common dimerization partner for all of these receptors is RXRa. The third class of 
receptors identified to date reside in both the nucleus and the cytoplasm and bind DNA preferentially as 
monomers (NGFI-B, FTZ-F1. steroidogenic factor 1 [SF-1], and RORaD (Giguere eta/., 1995; Kurachi et 

a/., 1994; Ohno eta/.. 1994). 

HNF4a is very similar to the retinoid receptors, in particular to RXRa, in both amino acid 
sequence and DNA binding specificity. Mouse RXRa is 60% identical to rat HNF4a in the DNA binding 
domain and 44% identical in the large hydrophobic domain. In comparison, RARa, which readily 
heterodimerizes with RXRa, is 61% identical to RXRa in the DNA binding domain and only 27% identical 
in the large hydrophobic domain (Mangelsdorf et aL, 1992). HNF4a and RXRa have also been shown to 
share response elements from at least six different genes as well as a consensus site of a direct repeat of 
AGGTCA separated by one nucleotide (referred to as DR+1) (Carter et aL, 1994; Carter et aL. 1993; 
Garcia et aL. 1993; Ge et aL, 1994; Hall et aL. 1994; Hall et aL. 1992; Kekule et a/., 1993; Ladias, 
1994; Lucas et aL, 1991; Nakshatri and Chambon, 1994; Widom et aL. 1992). The structural and 
functional similarities of HNF4a and RXRa suggest that HNF4a might heterodimerize with RXRa and/or 
other receptors. 

Electrophoretic mobility shift analyses (EMSA) of HNF4a and RXRa proteins expressed in vivo 
and in vitro showed that HNF4a in fact does not heterodimerize with RXRa on any one of a number of 
response elements and that while HNF4a forms homodimers in solution in the absence the DNA, it does 
not form heterodimers with RXRa. It has also been shown that HNF4a does not heterodimerize with a 
number of other receptors on DNA, suggesting that the lack of heterodimerization is a general property of 
HNF4a. 
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These studies led to the proposal that HNF4ct defines a new subfamily of nuclear receptors 
which are presently exclusively in the nucleus, exist in solution, bind DNA as homodimers, and do not 
form heterodimers with RXRa or other receptors. 

HNF4a is a member of the steroid hormone receptor family. The members of this family have 
been classified according to the amino acid sequence in the knuckle of the first zinc finger (referred to as 
the P box) a region important for recognizing the sequence of the half site of the palindrome in hormone 
response elements (Forman and Samuels, 1990). For examples members of the thyroid hormone receptor 
subfamily contain amino acid sequence EGCKG (SEQ ID N0:83) and bind to the thyroid response element 
(TRE). Members of the estrogen receptor subfamily contain the amino acids EGCKA (SEQ ID N0:84) and 
bind to estrogen response elements (ERE). The sequence of HNF4a is DGCKG (SEQ ID W0:85) and is 
most similar to that of the thyroid response element. Despite this similarity it appears that HNF4a does 
not bind TRE nor does it bind ERE, and the true ligand for HNF4a is as yet undetermined. The screening 
methods of the present invention will lead one of ordinary skill in the art to elucidate such a ligand or 



The present invention describes the exon-intron organization and partial sequence of the human 
HNF4ct gene. In addition, the inventors have screened the exons, flanking introns and minimal promoter 
region for mutations in a group of 57 unrelated Japanese subjects with early-onset diabetes/MODY of 
unknown cause. The results of these screens suggest that mutations in the HNF4a gene may cause 
early-onset diabetes/MODY in Japanese but they are less common than mutations in the HNFIa /M0DY3 
gene. The information presented herein on the sequence of the HNF4ot gene and its promoter region will 
facilitate the search for mutations in other populations and studies of the role of this gene in determining 
normal pancreatic p-cell function. 

Furthermore, current understanding of the M0DY1 form of diabetes is based on studies of only a 
single family, the R-W pedigree. Here the inventors report the identification of a second family with 
M0DY1 and the first in which there has been a detailed characterization of hepatic function. The present 
inventors demonstrate that M0DY1 is primarily a disorder of P-cell function, however, the inventors have 
ascertained that mutations in HNF4ct may lead to a-cell as well as P-cell secretory defects or to a 
reduction in pancreatic islet mass. 
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Hepatic Nuclear Factor /p a/wf 

Human HNF1 p is a homeodomain-containing transcription factor of 557 amino acids (type A) with 
alternative splicing generating two other forms of 531 (type B) and 399 amino acids (type 0 (Mendel et 
aL 1991a; De Simone et at., 1991; Rey-Campos et al., 1991; Bach and Yaniv, 1993). The nucleic and 
amino acid sequences for human HNFip are given in SEQ ID N0:128 and SEQ ID N0:129, respectively. 
HNF1p is structurally related to HNFIot and functions as a homodimer or a heterodimer with HNFIa. 
These dimers are stabilized by the Afunctional protein, DCoH/PCBD (Mendel et aL 1991b; Citron et al., 
1992), which binds to the dimerization domain of HNF1 forming a heterotetrameric complex and 
enhancing transcriptional activity. As a homotetramer, PCBD is involved in the regeneration of 
tetrahydrobiopterin, an essential cofactor of phenylalanine hydroxylase and other mono-oxygenases, 
catalyzing the conversion of 4hydroxytetrahydrobiopterin to quinonoiddihydrobiopterin (Citron et al., 
1993; Johnen et aL 1995). Loss of function mutations in PCBD are associated with a rare autosomal 
recessive form of mild hyperphenylalaninemia. HNFip and DCoH mRNA are expressed in mouse 
pancreatic islets implying that they may function together with HNMct to regulate gene expression in 
this tissue. Human DCoH is a protein of 104 amino acids (including the initiating methionine) (Thony et 
al., 1995) and functions as described herein below. 

MODYtype Diabetes is a Manifestation of Defects in Hepatocyte Nuclear Factors 
It is established that all forms of Type 2 diabetes are associated with profound insulin secretory 
defects which include loss of the first phase response to intravenous glucose, delayed and blunted 
responses to ingestion of a mixed meal, loss of the normal oscillatory patterns of insulin secretion, and 
increased secretion of proinsulin and proinsulin-like products. The molecular basis of these secretory 
defects in humans is unknown, although in rats it has been shown that there are global changes in gene 
expression in the islets of diabetic and prediabetic animals. One such global alteration is the reduction in 
the levels of mRNAs encoding many pancreatic islet specific proteins. This defect in gene expression 
would be compatible with decreased levels of a master transcription factor whose levels affect the 
expression of a whole array of downstream genes. 

The present invention predicts that the p cell dysfunction and insulin secretory defects 
associated with M0DY3 are as a result of mutations in HNFIa, furthermore it demonstrates that p-cell 
dysfunction associated with M0DY1 are a result of mutations in HNF4a. 
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The features of MODY-type diabetes are very similar to those of late onset Type 2 diabetes. 
Hence, acquired defects in the expression of HNFIct, HNF4a, and HNF10, respectively, may well occur in 
late onset diabetes and lead to p cell dysfunction and insulin secretory defects in this form of diabetes. 
The identification of agents that activate transcription of HNFIct, HNF10 and HNF4a will be therapeutic 
for the treatment of MODY, as well as late onset Type 2 diabetes. The present invention details methods 
for the identification of such agents which will then be used to increase the expression of HNF1a, 
HNFip and HNF4a which in turn will lead to the increased transcription/expression or activation of p-cell 
genes such as insulin. 

It is clear from the present invention that hepatocyte nuclear factors, their expression, regulation 
and modification have far reaching implications in diabetes. To date three of the four types of MOOY 
diabetes identified, are predicted to affect gene expression. Other forms of MODY can not be ruled out. 
for example genetic linkage studies predict the presence of additional MODY genes, the chromosomal 
localization of which are presently unknown. 

The absolute HNF4ct dependence of the HNF1a promoter coupled with evidence of the ability of 
HNF4ct to rescue endogenous HNF1a expression is indicative of HNF4ct being an essential regulator of 
HNFIct (FIG. 6). Thus activation or repression of HNF4ct will result in an indirect activation or repression 
of HNFIct . The present invention elucidates methods for identifying factors responsible for modulating 
HNF4a expression and/or activity. 

HNF1P, also known as vHNFl, is closely related to HNFIct and is able to form heterodimers with 
HNFIct. Dimerization between members of classes of transcription factors appears to solve the problem 
of controlling expression of a very large number genes. An obvious advantage of the dimerization ability 
of a transcription factor is that it provides an opportunity to diversify the number of regulatory 
mechanisms that can be associated with a single regulatory DNA binding site. Another advantage lies in 
the possibility of translating subtle alterations in the relative levels of expression of members of a 
dimerization pair into a substantial quantitative effect on transcription. 

FIG. 6 summarizes the different factors involved in the regulation of expression and activity of 
the HNF transcription factors described above. From the inventors investigations it is conceivable that 
aberrations at any points along this pathway or any factors affecting this pathway directly or indirectly 
will result in B-cell dysfunction and diabetes mellitus, either as MODY or late-onset diabetes. 
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The present invention has shown that mutations in HNFIa are clearly responsible for WI0DY3 
type diabetes. As discussed earlier HNFIa binds to DNA as a dimer. this can either be a homodimer or a 
heterodimer with HNFip ISEQ ID NO: 80). The two forms of HNF1 are expressed in comparable amounts 
in the liver but there is a three-fold higher expression of HNF1 p in the kidney as compared to HNFIa. 

HNFip lacks the transcriptional activity attributable to HNFIa. One potential consequence of 
this observation in combination with its ability to dimerize with HNFIa is that HNFip is likely to be a 
negative regulator of HNFIa transcriptional activity. This observation is suggested by the presence of 
vHNF1 in systems that do not express the majority of hepatocyte-specrfic gene products (Baumhueter el 
a/., 1988). However, studies by Mendel eta/.. (1991) were unable to confirm this observation. 

Studies by Mendel et a/.. (1991) indicated that a dimerization cof actor of HNF1 (DCoH) may 
increase the stability of HNFIa dimers. Thus, it is suggested that DCoH has the potential to restrict the 
activity of HNFIa and/or HNFip. There are a number of hypothesis as to how DCoH affects HNF1 
activation of transcription. HNFIa is a monomer in solution and can only bind DNA as a dimer, the 
presence of DCoH favors the formation of the dimeric HNFIa. Alternatively it is plausible that DCoH 
induces a conformational change in HNFIa to create a more potent transcriptional activator either 
directly or by allowing interaction with other proteins, for example HNFip. Yet another alternative is 
that DCoH decreases the rate of HNFIa degradation thereby stabilizing HNFIa and potentiating the 
effects of HNFIa. 

The present invention demonstrates that M0DY4, which was previously uncharacterized, is a 
manifestation of defects in HNFip. The present invention describes specific mutations in HNFip that 
have led to M0DY4 in certain individuals. In light of these observations, there are decribed herein 
methods for the identification and isolation of factors involved in the activity of HNFip and DCoH with a 
view to obtaining insights into therapeutic intervention in diabetes. 
C. In vitro Screening Assays for Candidate Substances 

Certain aspects of this invention concern methods for conveniently evaluating candidate 
substances to identify compounds capable of stimulating HNFIa-. HNFip- or HNF4a-mediated 
transcription. Such compounds will be capable of promoting gene expression, and thus can be said to 
have up-regulating activity. In as much as increased gene expression of, for example, the insulin gene in 
the body functions to alleviate the symptoms of diabetes, any positive substances identified by the 
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assays of the present invention will be antidiabetic drugs. Before human administration, such 
compounds would be rigorously tested using conventional animal models known to those of skill in the 
art. 

Successful candidate substances may function in the absence of mutations in HNFIa, HNFip or 
HNF4a in which case the candidate compound may be termed a "positive stimulator" of HNFIa, HNFip 
or HNF4a, respectively. Alternatively, such compounds may stimulate transcription in the presence of 
mutated HNFIa, HNFip or HNF4a overcoming the effects of the mutations, i.e., function to oppose 
HNFIamutant, and/or HNFip, and/or HNF4a-mediated diabetes, and thus may be termed "an HNFIa 
mutant agonist" "HNFip mutant agonist" or "HNF4a mutant agonist" respectively. Compounds may 
even be discovered which combine all three of these actions. Although the agonist class of compounds 
may ultimately seem to be the most desirable, compounds of either class will likely be useful therapeutic 
agents for use in stimulating gene expression and combating M0DY1, M0DY3, M0DY4, and late-onset 
Type 2 diabetes in human subjects. 
Candidates for HNFIa. 

As HNFIa is herein shown to be linked to M0DY3 type, one method by which to identify a 
candidate substance capable of stimulating /Wf/a-mediated transcription in diabetes is based upon 
specific proteinrDNA binding. Accordingly, to conduct such an assay, one may prepare an HNFta binding 
protein composition, such as recombinant HNFIa, and determine the ability of a candidate substance to 
increase HNFIa protein binding to a DNA segment including a complementary HNFIa binding sequence, 
i.e., to increase the amount or the binding affinity of a proteinrDNA complex. 

This generally would be achieved using two parallel assays, one of which contains HNFIa and 
the specific DNA alone and one of which contains HNFIa, DNA and the candidate substance 
composition. One would perform each assay under conditions, and for a period of time, effective to allow 
the formation of proteinrDNA complexes, and one would then separate the bound proteinrDNA complexes 
from any unbound protein or DNA and measure the amount of the proteinrDNA complexes. An increase in 
the amount of the bound proteinrDNA complex formed in the presence of the candidate substance would 
be indicative of a candidate substance capable of promoting HNFIa binding, and thus, capable of 
stimulating HNF 1 a-mediated transcription. 
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In such binding assays, the amount of the protein:DNA complex may be measured, after the 
removal of unbound species, by detecting a label, such as a radioactive or enzymatic label which has 
been incorporated into the original HNFla protein composition or recombinant protein or HNFla- 
containing DNA segment. Alternatively, one could detect the protein portion of the complex by means of 
5 an antibody directed against the protein, such as those disclosed herein. 

Preferred binding assays are those in which either the HNFla protein, recombinant protein or 
purified composition or the HNF1a-containing DNA segment is bound to a solid support and contacted 
with the other component to allow complex formation. Unbound protein or DNA components are then 
separated from the proteimDNA complexes by washing and the amount of the remaining bound complex 
10 quantitated by detecting the label or with antibodies. Such DNA binding assays form the basis of filter- 
binding and microtiter plate-type assays and can be performed in a semi-automated manner to enable 
analysis of a large number of candidate substances in a short period of time. Electrophoretic methods, 
such as the gel-shift assay disclosed herein, could also be employed to separate unbound protein or DNA 
from bound protein:DNA complexes, but such labor-intensive methods are not preferred. 
15 Assays such as those described above are initially directed to identifying positive stimulator 

candidate substances and do not, by themselves, address the activity of the substance in the presence of 
HNFla mutants. However, such positive regulators may also prove to act as HNFla mutant agonists, 
and in any event, would likely have utility in transcriptional promotion, either in vitro or in vivo. Positive 
regulators would likely be further evaluated to assess the effects of HNFla mutants on their action, for 
20 example, by employing a cellular reporter gene assay such as those described herein below. 

Virtually any candidate substance may be analyzed by these methods, including compounds which 
may interact with HNFla binding protein(s). HNFla or protein:DNA complexes, and also substances such 
as enzymes which may act by physically altering one of the structures present. Of course, any compound 
isolated from natural sources such as plants, animals or even marine, forest or soil samples, may be 
25 assayed, as may any synthetic chemical or recombinant protein. 

Another potential method for stimulating HNF1a-mediated transcription is to prepare a HNFla 
protein composition and to modify the protein composition in a manner effective to increase HNFla 
protein binding to a DNA segment including the HNFla protein binding sequence. The binding assays 
would be performed in parallel, similar to those described above, allowing the native and modified HNFla 
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binding protein to be compared, in addition to phosphatases and kinases, other agents, including 
proteases and chemical agents, could be employed to modify HNFIa binding protein. The present 
invention, with the cloning of mutant HNFIa cDNA, also opens the way for genetically engineering 
HNFIa protein to promote gene transcription in diabetes. In this regard, the mutation of potential 
phosphorylation sites and/or the modification or deletion of other domains is contemplated. 



The criteria shown above for screening of modulators of HNFIa are also true of HNF4a. HNF4a 
is a member of the steroid hormone receptor superfamily however, the ligand for HNF4a is unknown. 
The identification of the endogenous iigand for HNF4a binding would be an important step towards 
elucidating the mechanisms of eukaryotic gene control, and would also provide biomedical science with a 
powerful tool by which to regulate specific gene expression. Such a development would lead to numerous 
useful applications in the pharmaceutical and biotechnological industries. Although many applications are 
envisioned, one particularly useful application would be as the central component in screening assays to 
identify new classes of pharmacologically active substances which may be employed to manipulate, and 
particularly, to promote, the transcription of genes whose expression is altered in diabetes. 

Hence HNF4<x would be of great use in identifying agents to combat MODY and Type 2 diabetes. 
An anti-diabetic agent isolated by the screening methods of the present invention would act to promote 
the cellular transcription or function of HNF4a, which would in turn serve to increase transcription of 
genes whose activity is regulated by HNF4a (for example HNFIa) thereby increasing the transcription of 
genes involved in diabetes and alleviating the symptoms of diabetes. 



The criteria shown above for screening of modulators of HNFIa and HNF4a are also true of 
HNF1|5. HNFip is a 557 amino acid that is structurally related to HNFIa and functions as a homodimer 
and heterodimer with HNFIa. These dimers are stabilized by DCoH. The identification of factors that 
affect this dimerization, or any of the factors involved in the heterotetrameric complex, will provide useful 
compounds for the modulation of transcriptional activity. Such a development would lead to numerous 
useful applications in the pharmaceutical and biotechnological industries. Although many applications are 
envisioned, one particularly useful application would be as the central component in screening assays to 
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identify new classes of pharmacologically active substances which may be employed to manipulate, and 
particularly, to promote, the transcription of genes whose expression is altered in diabetes. 

Hence HNFip would be of great use in identifying agents to combat MODY and Type 2 diabetes. 
An anti-diabetic agent isolated by the screening methods of the present invention would act to promote 
the cellular transcription or function of HNFip, which would in turn serve to increase transcription of 
genes whose activity is regulated by HNFip (for example HNF1a) thereby increasing the transcription of 
genes involved in diabetes and alleviating the symptoms of diabetes. 
D. Reporter Genes and Cell-Based Screening Assays 

Cellular assays also are available for screening candidate substances to identify those capable of 
stimulating HNF1a- HNFip- and HNF4a-mediated transcription and gene expression. In these assays, 
the increased expression of any natural or heterologous gene under the control of a functional HNFIct, 
HNFip or HNF4a protein may be employed as a measure of stimulatory activity, although the use of 
reporter genes is preferred. A reporter gene is a gene that confers on its recombinant host cell a readily 
detectable phenotype that emerges only under specific conditions. In the present case, the reporter gene, 
being under the control of a functional HNF1a, HNFip or HNF4a protein, will generally be repressed 
under conditions of M0DY3. M0DY4 or M0DY1 diabetes respectively and will generally be expressed in 
the M0DY3, M0DY4 or M0DY1 non diabetic conditions respectively. 

Reporter genes are genes which encode a polypeptide not otherwise produced by the host cell 
which is detectable by analysis of the cell culture, e.g.. by fluorometric, radioisotopic or 
spectrophotometry analysis of the cell culture. Exemplary enzymes include luciferases, transferases, 
esterases, phosphatases, proteases (tissue plasminogen activator or urokinase), and other enzymes 
capable of being detected by their physical presence or functional activity. A reporter gene often used is 
chloramphenicol acetyltransferase (CAT) which may be employed with a radiolabeled substrate, or 
luciferase, which is measured fluorometrically. 

Another class of reporter genes which confer detectable characteristics on a host cell are those 
which encode polypeptides, generally enzymes, which render their transformants resistant against toxins. 
e.g., the neo gene which protects host cells against toxic levels of the antibiotic G418, and genes 
encoding dihydrofolate reductase, which confers resistance to methotrexate. Genes of this class are not 
generally preferred since the phenotype (resistance) does not provide a convenient or rapid quantitative 
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output. Resistance to antibiotic or toxin requires days of culture to confirm, or complex assay procedures 
if other than a biological determination is to be made. 

Other genes of potential for use in screening assays are those capable of transforming hosts to 
express unique cell surface antigens, e.g.. viral env proteins such as HIV gp120 or herpes gD, which are 
readily detectable by immunoassays. However, antigenic reporters are not preferred because, unlike 
enzymes, they are not catalytic and thus do not amplify their signals. 

The polypeptide products of the reporter gene are secreted, intracellular or, as noted above 
membrane bound polypeptides. If the polypeptide is not ordinarily secreted it is fused to a heterologous 
s,gnal sequence for processing and secretion. In other circumstances the signal is modified in order to 
remove sequences that interdict secretion. For example, the herpes gD coat protein has been modified by 
s,te directed deletion of its transmembrane binding domain, thereby facilitating its secretion |EP 
139.417A). This truncated form of the herpes gD protein is detectable in the culture medium by 
conventional immunoassays. Preferably, however, the products of the reporter gene are .edged in the 
mtracellular or membrane compartments. Then they can be fixed to the culture container, e.g.. microtiter 
wells, in which they are grown, followed by addition of a detectable signal generating substance such as 
a chromogenic substrate for reporter enzymes. 

The transcriptional promotion process which, in its entirety, leads to enhanced transcription is 
termed "activation." The mechanism by which a successful candidate substance acts is not material 
since the objective is to promote HNFIa, HNFip or HNF4a mediated gene expression, or even, to 
promote gene expression in the presence of mutant HNFIa, HNFip, or HNF4a gene products, by 
whatever means. 

To create an appropriate vector or piasmid for use in such assays one would ligate the HNF la- 
containing promoter, whether a hybrid or the native HNF1a promoter, to a DNA segment encoding the 
reporter gene by conventional methods. Similar assays are also contemplated using HNFip and HNF4a 
promoters. The HNFIa, HNFip or HNF4a promoter sequences may be obtained by in vitro synthesis or 
recovered from genomic DNA and should be ligated upstream of the start codon of the reporter gene. The 
present invention provides the promoter region for human HNFIa, a comparison of the sequence of the 
promoter region of the human, rat, mouse, chicken and frog HNFIa genes is given in FIG. 22. There is 
also provided herein aomparison of the sequences of the promoter regions of the human and mouse 
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HNF4a genes (FIG. 13). The partial sequence of the human HNF1 p gene including promoter has also 
been identified by the present inventors and deposited in the GenBank database under accession numbers 
U90279-90287 and U96079. Any of these promoters may be particularly preferred in the present 
invention. An AT-rich TATA box region should also be employed and should be located between the HNF 
5 sequence and the reporter gene start codon. The region 3' to the coding sequence for the reporter gene 
will ideally contain a transcription termination and polyadenylation site. The promoter and reporter gene 
may be inserted into a replicable vector and transfected into a cloning host such as B. coli, the host 
cultured and the replicated vector recovered in order to prepare sufficient quantities of the construction 
for later transfection into a suitable eukaryotic host. 
10 Host cells for use in the screening assays of the present invention will generally be mammalian 

cells, and are preferably cell lines which may be used in connection with transient transfection studies. 
Cell lines should be relatively easy to grow in large scale culture. Also, they should contain as little native 
background as possible considering the nature of the reporter polypeptide. Examples include the Hep G2. 
VERO. HeLa, human embryonic kidney IHEK)- 293. CHO, WI38. BHK, COS-7, and MDCK cell lines, with 
1 5 monkey CV-1 cells being particularly preferred. 

The screening assay typically is conducted by growing recombinant host cells in the presence and 
absence of candidate substances and determining the amount or the activity of the reporter gene. To 
assay for candidate substances capable of exerting their effects in the presence of mutated HNFIct, 
HNFlp and/or HNF4a gene products, one would make serial molar proportions of such gene products that 
20 alter HNF1a-, HNFip-and HNF4a-mediated expression. One would ideally measure the reporter signal 
level after an incubation period that is sufficient to demonstrate mutant-mediated repression of signal 
expression in controls incubated solely with mutants. Cells containing varying proportions of candidate 
substances would then be evaluated for signal activation in comparison to the suppressed levels. 

Candidates that demonstrate dose related enhancement of reporter gene transcription or 
25 expression are then selected for further evaluation as clinical therapeutic agents. The stimulation of 
transcription may be observed in the absence of mutant HNFIct, HNFlp or HNF4a. in which case the 
candidate compound might be a positive stimulator of HNFIct HNFlp or HNF4a transcription, 
respectively. Alternatively, the candidate compound might only give a stimulation in the presence 
mutated HNFIot, mutated HNFlp or mutated HNF4ot protein, which would indicate that it functions to 
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oppose the mutation mediated suppression of the gene expression. Candidate compounds of either class 
might be useful therapeutic agents that would stimulate gene expression and thereby combating MODY 
and Type 2 diabetes. 
E. Nucleic Acids 

As described the Examples, the present invention discloses the gene at the M0DY3 locus of 
chromosome 12. M0DY4 locus as being associated with HNFip and the gene at the MOOY1 locus of 
chromosome 20. Mutations in these genes are responsible for diabetes. The present invention discloses 
mutations in the HNFIa, HNF1 fc and HNF4a genes identified by PCR techniques. The gene for the M0DY3 
locus has for the first time been identified as hepatocyte nuclear factor 1a, herein referred to as HNFIa. 
The gene for the M0DY1 locus has been identified as hepatocyte nuclear factor 4 a (HNF4a). The gene for 
the M0DY4 locus has been identified as hepatocyte nuclear factor 10 (HNFip). 

In one embodiment of the present invention, the nucleic acid sequences disclosed herein find utility 
as hybridization probes or amplification primers. In certain embodiments, these probes and primers consist 
of oligonucleotidefragments. Such fragments should be of sufficient length to provide specific hybridization 
to an RNA or DNA sample extracted from tissue. The sequences typically will be 10-20 nucleotides, but 
may be longer. Longer sequences,*^., 40, 50, 100, 500 and even up to full length, are preferredfor certain 
embodiments. 

Nucleic acid molecules having contiguous stretches of about 10, 15, 1 7, 20, 30, 40, 50, 60, 75 or 
100 or 500 nucleotides from a sequence selected from the group comprising SEQ ID N0:1, SEQ ID N0:3, 
SEQ ID N0:5, SEQ ID N0:7. HNFIa and its mutants are contemplated. In other embodiments nucleotides 
from a sequence selected from the group comprising SEQ ID N0:78, SEQ ID N0:34, SEQ ID N0:36, SEQ ID 
N0:38, SEQ ID N0:40, SEQ ID N0:42, SEQ ID N0:44, SEQ ID N0:46. SEQ ID N0:48, SEQ ID N0:50, SEQ ID 
N0:52, SEQ ID N0:54, HNF4a and its mutants are contemplated. In still other embodiments nucleotides 
from a sequence selected from the group comprising SEQ ID NO:—, SEQ ID NO:-, SEQ ID NO:-, SEQ ID 
NO:-, SEQ ID NO:-, SEQ ID NO:-, SEQ ID NO:-, SEQ ID NO:-, SEQ ID NO:-, SEQ ID NO:-, SEQ ID NO:- 
. SEQ ID NO:-, HNFip and its mutants are contemplated. Molecules that are complementary to the above 
mentioned sequences and that bind to these sequences under high stringency conditions also are 
contemplated. These probes will be useful in a variety of hybridization embodiments, such as Southern and 
northern blotting. In some cases, it is contemplated that probes may be used that hybridize to multiple target 
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sequences without compromising their ability to effectively diagnose diabetes <MODY1, M0DY3, and 
M0DY4). In certain embodiments, it is contemplated that multiple probes may be used for hybridization to a 
single sample. 

Various probes and primers can be designed around the disclosed nucleotide sequences. Primers 
may be of any length but. typically, are 10-20 bases in length. By assigning numeric values to a sequence, 
for example, the first residue is 1, the second residue is 2, etc., an algorithm defining all primers can be 



n to n + y 

where n is an integer from 1 to the last number of the sequence and y is the length of the primer 
minus one, where n + y does not exceed the last number of the sequence. Thus, for a 10-mer, the probes 
correspond to bases 1 to 10. 2 to 1 1. 3 to 12 ... and so on. For a 15-mer, the probes correspond to bases 1 
to 1 5, 2 to 1 6. 3 to 1 7 ... and so on. For a 20-mer, the probes correspond to bases 1 to 20, 2 to 21 , 3 to 22 
... and so on. 

The values of n in the algorithm above for the nucleic acid sequences is: SEQ ID N0:1, n-3238 for 
HNF1a, SEO ID N0:78 n- 1441 for HNF4a, SEQ ID N0:1 28 for HNF1 p. 

The use of a hybridization probe of between 17 and 100 nucleotides in length allows the formation 
of a duplex molecule that is both stable and selective. Molecules having complementary sequences over 
stretches greater than 20 bases in length are generally preferred, in order to increase stability and selectivity 
of the hybrid, and thereby improve the quality and degree of particular hybrid molecules obtained. One will 
generally prefer to design nucleic acid molecules having stretches of 20 to 30 nucleotides, or even longer 
where desired. Such fragments may be readily prepared by, for example, directly synthesizing the fragment 
by chemical means or by introducing selected sequences into recombinant vectors for recombinant 
production. 

Accordingly, the nucleotide sequences of the invention may be used for their ability to selectively 
form duplex molecules with complementary stretches of genes or RNAs or to provide primers for 
amplification of DNA or RNA from tissues. Depending on the application envisioned, one will desire to employ 
varying conditions of hybridization to achieve varying degrees of selectivity of probe towards target 
sequence. 
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For applications requiring high selectivity, one will typically desire to employ relatively stringent 
conditions to form the hybrids, e.g., one will select relatively low salt and/or high temperature conditions, 
such as provided by about 0.02 M to about 0.10 M NaCI at temperatures of about 50°C to about 70°C. 
Such high stringency conditions tolerate little, if any, mismatch between the probe and the template or 
target strand, and would be particularly suitable for isolating specific genes or detecting specific mRNA 
transcripts. It is generally appreciated that conditions can be rendered more stringent by the addition of 
increasing amounts of f ormamide. 

For certain applications, for example, substitution of nucleotides by site-directed mutagenesis, it is 
appreciated that lower stringency conditions are required. Under these conditions, hybridization may occur 
even though the sequences of probe and target strand are not perfectly complementary, but are mismatched 
at one or more positions. Conditions may be rendered less stringent by increasing salt concentration and 
decreasing temperature. For example, a medium stringency condition could be provided by about 0. 1 to 0.25 
M NaCI at temperatures of about 37°C to about 55°C, while a low stringency condition could be provided 
by about 0.15 M to about 0.9 M salt, at temperatures ranging from about 20°C to about 55°C. Thus, 
hybridization conditions can be readily manipulated depending on the desired results. 

In other embodiments, hybridization may be achieved under conditions of, for example, 50 mM Tris- 
HCI (pH 8.3), 75 mM KCI, 3 mM MgCI 2 , 1.0 mM dithiothreitol, at temperatures between approximately 
20°C to about 37°C. Other hybridization conditions utilized could include approximately 10 mM Tris-HCI 
IpH 8.3), 50 mM KCI, 1 .5 mM MgCI 2 , at temperatures ranging from approximately 40 °C to about 72°C. 

in certain embodiments, it will be advantageous to employ nucleic acid sequences of the present 
invention in combination with an appropriate means, such as a label, for determining hybridization. A wide 
variety of appropriate indicator means are known in the art, including fluorescent, radioactive, enzymatic or 
other ligands, such as avidin/biotin, which are capable of being detected. In preferred embodiments, one may 
desire to employ a fluorescent label or an enzyme tag such as urease, alkaline phosphatase or peroxidase, 
instead of radioactive or other environmentallyundesirable reagents. In the case of enzyme tags, colorhnetric 
indicator substrates are known that can be employed to provide a detection means visible to the human eye 
or spectrophotometrically, to identify specific hybridization with complementary nucleic acid-containing 
samples. 
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In general, it is envisioned that the hybridization probes described herein will be useful both as 
reagents in solution hybridization, as in PCR, for detection of expression of corresponding genes, as well as 
in embodiments employing a solid phase. In embodiments involving a solid phase, the test DNA (or RNA) is 
adsorbed or otherwise affixed to a selected matrix or surface. This fixed, single-stranded nucleic acid is then 
subjected to hybridization with selected probes under desired conditions. The selected conditions will depend 
on the particular circumstances based on the particular criteria required (depending, for example, on the 
B+C content, type of target nucleic acid, source of nucleic acid, size of hybridization probe, etc.). Following 
washing of the hybridized surface to remove non-specifically bound probe molecules, hybridization is 
detected, or even quantified, by means of the label. 

It will be understood that this invention is not limited to the particular probes disclosed herein and 
particularly is intended to encompass at least nucleic acid sequences that are hybridizable to the disclosed 
sequences or are functional analogs of these sequences. 

For applications in which the nucleic acid segments of the present invention are incorporated into 
vectors, such as plasmids, cosmids or viruses, these segments may be combined with other DNA sequences, 
such as promoters, polyadenylation signals, restriction enzyme sites, multiple cloning sites, other coding 
segments, and the like, such that their overall length may vary considerably.lt is contemplated that a nucleic 
acid fragment of almost any length may be employed, with the total length preferably being limited by the 
ease of preparation and use in the intended recombinant DNA protocol. 

DNA segments encoding a specific gene may be introduced into recombinant host cells and 
employed for expressing a specific structural or regulatory protein. Alternatively, through the application of 
genetic engineering techniques, subportions or derivatives of selected genes may be employed. Upstream 
regions containing regulatory regions such as promoter regions may be isolated and subsequently employed 
for expression of the selected gene. 

In an alternative embodiment, the HNF 1a, HNF1 p or HNF4a nucleic acids employed may actually 
encode antisense constructs that hybridize, under intracellular conditions, to an HNFIcx or HNFa nucleic 
acid, respectively. The term "antisense construct" is intended to refer to nucleic acids, preferably 
oligonucleotides, that are complementary to the base sequences of a target DNA or RNA. Antisense 
oligonucleotides, when introduced into a target cell, specifically bind to their target nucleic acid and 
interfere with transcription, RNA processing, transport, translation and/or stability. 
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Antisense constructs may be designed to bind to the promoter and other control regions, exons, 
introns or even exon-intron boundaries of a gene. Antisense RNA constructs, or ONA encoding such 
antisense RNA's, may be employed to inhibit gene transcription or translation or both within a host cell, 
either in vitro or in vivo, such as within a host animal, including a human subject. Nucleic acid sequences 
which comprise "complementary nucleotides" are those which are capable of base-pairing according to 
the standard Watson-Crick complementarity rules. That is, the larger purines will base pair with the 
smaller pyrimidines to form combinations of guanine paired with cytosine {G:C) and adenine paired with 
either thymine (A:T), in the case of DNA, or adenine paired with uracil (A:U) in the case of RNA. Inclusion 
of less common bases such as inosine, 5-methylcytosine, 6-methyladenine, hypoxanthine and others in 
hybridizing sequences does not interfere with pairing. 

As used herein, the terms "complementary" means nucleic acid sequences that are substantially 
complementary over their entire length and have very few base mismatches. For example, nucleic acid 
sequences of fifteen bases in length may be termed complementary when they have a complementary 
nucleotide at thirteen or fourteen positions with only a single mismatch. Naturally, nucleic acid 
sequences which are "completely complementary" will be nucleic acid sequences which are entirely 
complementary throughout their entire length and have no base mismatches. 

Other sequences with lower degrees of homology also are contemplated. For example, an 
antisense construct which has limited regions of high homology, but also contains a non-homologous 
region \e.g., a ribozyme) could be designed. These molecules, though having less than 50% homology, 
would bind to target sequences under appropriate conditions. 

While all or part of the HNFIa, HNFip. HNF4a gene sequence may be employed in the context 
of antisense construction, short oligonucleotides are easier to make and increase in vivo accessibility. 
However, both binding affinity and sequence specificity of an antisense oligonucleotide to its 
complementary target increases with increasing length. It is contemplated that antisense 
oligonucleotides of 8. 9, 10, 1 1, 12, 13, 14, 15, 16, 17, 18. 19, 20, 25. 30, 35, 40, 45, 50, 60. 70, 80, 
90, 100 or more base pairs will be used. One can readily determine whether a given antisense nucleic 
acid is effective at targeting of the corresponding host cell gene simply by testing the constructs in vitro 
to determine whether the endogenous gene's function is affected or whether the expression of related 
genes having complementary sequences is affected. 
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|„ certain embodiments, one may wish to employ antisense constructs which include other 
elements, for example, those which include C-5 propyne pyridines. Oligonucleotides which contain C-5 
propyne analoflues of uridine and cytidine have been shown to bind RNA with high affinity and to be 
potent antisense inhibitors of gene expression (Wagner et at., 1 993). 

Throughout this application, the term "expression construct" is meant to include any type of 
genetic construct containing a nucleic acid coding for a gene product in which part or all of the nucleic 
acid encoding sequence is capable of being transcribed. The transcript may be translated into a protein, 
but it need not be. Thus, in certain embodiments, expression includes both transcription of a gene and 
translation of a RNA into a gene product. In other embodiments, expression only includes transcription of 
the nucleic acid, for example, to generate antisense constructs. 

In preferred embodiments, the nucleic acid is under transcriptional control of a promoter. A 
"promoter" refers to a DNA sequence recognized by the synthetic machinery of the cell, or introduced 
synthetic machinery, required to initiate the specific transcription of a gene. The phrase "under 
transcriptional control" means that the promoter is in the correct location and orientation in relation to 
the nucleic acid to control RNA polymerase initiation and expression of the gene. 

The term promoter will be used here to refer to a group of transcriptional control modules that 
are clustered around the initiation site for RNA polymerase II. Much of the thinking about how promoters 
are organized derives from analyses of several viral promoters, including those for the HSV thymidine 
kinase (tk) and SV40 early transcription units. These studies, augmented by more recent work, have 
shown that promoters are composed of discrete functional modules, each consisting of approximately 7- 
20 bp of DNA, and containing one or more recognition sites for transcriptional activator or repressor 
proteins. 

At least one module in each promoter functions to position the start site for RNA synthesis. The 
best known example of this is the TATA box, but in some promoters lacking a TATA box. such as the 
promoter for the mammalian terminal deoxynucleotidyl transferase gene and the promoter for the SV40 
late genes, a discrete element overlying the start site itself helps to fix the place of initiation. 

Additional promoter elements regulate the frequency of transcriptional initiation. Typically, these 
are located in the region 30-110 bp upstream of the start site, although a number of promoters have 
recently been shown to contain functional elements downstream of the start site as well. The spacing 



WO 98/11254 

PCT/US97/16037 

38 

between promoter elements frequently is flexible, so that promoter function is preserved when elements 
are mverted or moved re»ative to one another. In the tk promoter, the spacing between promoter 
elements can be increased to 50 bp apart before activity begins to decline. Depending on the promoter it 
appears that individual elements can function either co-operatively or independently to activate 
transcription. 

The particular promoter that is employed to control the expression of a nucleic acid is not 
believed to be critical, so long as it is capable of expressing the nucleic acid in the targeted cell. Thus 
where a human cell is targeted, it is preferable to position the nucleic acid coding region adjacent to and 
under the control of a promoter that is capable of being expressed in a human cell. Generally speaking 
such a promoter might include either a human or viral promoter. Preferred promoters include those 
denved from HSV, and HNFIa (see for example, FIG. 22), HNF1 B or HNF4a promoter (see for example 
FIG. 13). The partial sequence of the human HNF1B gene including promoter has also been identified by 
the present inventors and deposited in the GenBank database under accession numbers U90279-90287 
and U96079 (SEQ ID M0:128). Another preferred embodiment is the tetracycline controlled promoter. 

In various other embodiments, the human cytomegalovirus (CMV) immediate early gene promoter 
the SV40 early promoter and the Rous sarcoma virus long terminal repeat can be used to obtain high-level 
expression of transgenes. The use of other viral or mammalian cellular or bacterial phage promoters 
which are well-known in the art to achieve expression of a transgene is contemplated as well, provided 
that the levels of expression are sufficient for a given purpose. Tables 1 and 2 list several 
elements/promoters which may be employed, in the context of the present invention, to regulate the 
expression of a transgene. This list is not intended to be exhaustive of all the possible elements involved 
m the promotion of transgene expression but, merely, to be exemplary thereof. 

Enhancers were originally detected as genetic elements that increased transcription from a 
promoter located at a distant position on the same molecule of DNA. This ability to act over a large 
distance had little precedent in classic studies of prokaryotic transcriptional regulation. Subsequent work 
showed that regions of DNA with enhancer activity are organized much like promoters. That is, they are 
composed of many individual elements, each of which binds to one or more transcriptional proteins. 

The basic distinction between enhancers and promoters is operational. An enhancer region as a 
whole must be able to stimulate transcription at a distance; this need not be true of a promoter region or 
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its component elements. On the other hand, a promoter must have one or more elements that direct 
initiation of RNA synthesis at a particular site and in a particular orientation, whereas enhancers lack 
these specificities. Promoters and enhancers are often overlapping and contiguous, often seeming to 
have a very similar modular organization. 

Additionally any promoter/enhancer combination (as per the Eukaryotic Promoter Data Base 
EPDB) could also be used to drive expression of a transgene. Use of a T3, T7 or SP6 cytoplasmic 
expression system is another possible embodiment. Eukaryotic cells can support cytoplasmic 
transcription from certain bacterial promoters if the appropriate bacterial polymerase is provided, either 
as part of the delivery complex or as an additional genetic expression construct. 
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table i 





Immunoglobulin Heavy Cham 


c-HA-ras 


immunoglobulin Light Chain 


Insulin 


T-Cell Receptor 


Neural Cell Adhesion Molecule (NC AM) 


HLA DQa and DQB 


<x r Anti-trypsin 


B-lnterferon 


H2B (TH2B) Histone 


lnterieukin-2 


Mouse or Type 1 Collagen 


lnterleukin-2 Receptor 


Glucose-RegulatedProteins (GRP94 and GRP78) 


I fWHC Class II 5 


Rat Growth Hormone | 


fWHC Class II HLA-DRct 


Human Serum Amyloid A ISAA) 


j B-Actin 


Troponin 1 {TM 1) 


Muscle Creatine Kinase 


Platelet-Derived Growth Factor 


Prealbumin (Transthyretin) 


Duchenne Muscular Dystrophy 


Elastase/ 


SV40 1 


Metallofhinnpin 

uric laiitiLiiiui icill 


Polyoma 


Collagenase 


Retroviruses 


Albumin Gene 


Papilloma Virus 


ot-Fetoprotein 


Hepatitis B Virus 


aGlobin 


iuman Immunodeficiency Virus 


B-Globin 


Cytomegalovirus 


c-fos 


Gibbon Ape Leukemia Virus 

— — , 1, 
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TABLE 2 



Element 


Inducer 


MT II 


Phorbol Ester (TPA) 
Heavy metals 


M MTV (mouse mammary tumor virus) 


Glucocorticoids 


[^Interferon 


polyiniX 
polylrc) 


Arlonnuirn^ R E2 


Ela ! 

. . 


c-jun 


Phorbol Ester {TPA), H 2 0 2 


PnllononaQP 


Phorbol Ester (TPA) 


Q+rnmnlifCin 

oiiurneiyoiu 


Phorbol Ester (TPA), IL-1 


oV**U 


Phorbol Ester (TPA) 


Miinno My npnp 

IvIUNllU IVIA UCIIC 


Interferon, Newcastle Disease Virus 


RRP7R RpnP 


A23187 


™ .9.Marrnfi!nhulin 


IL-6 


Vimentin 


Serum 


MHC Class 1 Gene H-2kB 


Interferon 


HSP70 


Ela, SV40 Large T Antigen 


Proliferin 


Phorbol Ester-TPA 


Tumor Necrosis Factor 


FMA 


| Thyroid Stimulating Hormone a Gene 


Thyroid Hormone 



Use of the baculovirus system will involve high level expression from the powerful polyhedron 
promoter. 

One will typically include a polyadenylation signal to effect proper polyadenylation of the 
transcript. The nature of the polyadenylation signal is not believed to be crucial to the successful 
practice of the invention, and any such sequence may be employed. Preferred embodiments include the 
SV40 polyadenylation signal and the bovine growth hormone polyadenylation signal, convenient and 
known to function well in various target cells. Also contemplated as an element of the expression 



WO 98/11254 

PCT/US97/16037 

42 

cassette is a terminator. These elements can serve to enhance message levels and to minimize read 
through from the cassette into other sequences. 

A specific initiation signal also may be required for efficient translation of coding sequences. 
These signals include the ATG initiation codon and adjacent sequences. Exogenous translational control 
signals, including the ATG initiation codon, may need to be provided. One of ordinary skill in the art would 
readily be capable of determining this and providing the necessary signals. It is well known that the 
initiation codon must be "in-frame" with the reading frame of the desired coding sequence to ensure 
translation of the entire insert. The exogenous translational control signals and initiation codons can be 
either natural or synthetic. The efficiency of expression may be enhanced by the inclusion of appropriate 
transcription enhancer elements (Bittner et at., 1 987). 

In various embodiments of the invention, the expression construct may comprise a virus or 
engineered construct derived from a viral genome. The ability of certain viruses to enter cells via 
receptor-mediated endocytosis and to integrate into the host cell genome and express viral genes stably 
and efficiently have made them attractive candidates for the transfer of foreign genes into mammalian 
cells (Ridgeway, 1988; Nicolas and Rubenstein. 1988; Baichwal and Sugden, 1986; Temin, 1986). The 
first viruses used as vectors were DNA viruses including the papovaviruses (simian virus 40, bovine 
papilloma virus, and polyoma) (Ridgeway, 1988; Baichwal and Sugden, 1986) and adenoviruses 
(Ridgeway, 1988; Baichwal and Sugden, 1986) and adeno-associated viruses. Retroviruses also are 
attractive gene transfer vehicles (Nicolas and Rubenstein, 1988; Temin, 1986) as are vaccina virus 
(Ridgeway, 1988) and adeno-associated virus (Ridgeway, 1988). Such vectors may be used to (i) 
transform cell lines in vitro for the purpose of expressing proteins of interest or (ii) to transform cells in 
vitro or in vivo to provide therapeutic polypeptides in a gene therapy scenario. 

In some embodiments, the vector is HSV. Because HSV is neurotropic, it has generated 
considerable interest in treating nervous system disorders. Since insulin-secreting pancreatic p-cells 
share many features with neurons, HSV may be useful for delivering genes to p-cells and for gene therapy 
of diabetes. Moreover, the ability of HSV to establish latent infections in non-dividing neuronal cells 
without integrating into the host cell chromosome or otherwise altering the host cell's metabolism, along 
with the existence of a promoter that is active during latency. And though much attention has focused 
on the neurotropic applications of HSV, this vector also can be exploited for other tissues. 
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Another factor that makes HSV an attractive vector is the size and organization of the genome. 
Because HSV is large, incorporation of multiple genes or expression cassettes is less problematic than in 
other smaller viral systems. In addition, the availability of different viral control sequences with varying 
performance (temporal, strength, etc.) makes it possible to control expression to a greater extent than in 
5 other systems. It also is an advantage that the virus has relatively few spliced messages, further easing 
genetic manipulations. 

HSV also is relatively easy to manipulate and can be grown to high titers. Thus, delivery is less 
of a problem, both in terms of volumes needed to attain sufficient MOI and in a lessened need for repeat 
dosings. 

10 F. Encoded Proteins 

Once the entire coding sequence of a marker-associated gene has been determined, the gene can be 

inserted into an appropriate expression system. The gene can be expressed in any number of different 
recombinant DNA expression systems to generate large amounts of the polypeptide product, which can then 
be purified and used to vaccinate animals to generate antisera with which further studies may be conducted. 
15 Examples of expression systems known to the skilled practitioner in the art include bacteria such as 

£ coli, yeast such as Saccharomyces cerevisia and Pichia pastoris, baculovirus, and mammalian expression 
systems such as in COS or CHO cells. In one embodiment, polypeptides are expressed in £ coli and in 
baculovirus expression systems. A complete gene can be expressed or, alternatively, fragments of the gene 
encoding portions of polypeptide can be produced. 
20 In one embodiment, the gene sequence encoding the polypeptide is analyzed to detect putative 

transmembrane sequences. Such sequences are typically very hydrophobic and are readily detected by the 
use of standard sequence analysis software, such as MacVector (IBI, New Haven, CT). The presence of 
transmembrane sequences is often deleterious when a recombinant protein is synthesized in many 
expression systems, especially £ coli. as it leads to the production of insoluble aggregates that are difficult 
25 to renature into the native conformation of the protein. Deletion of transmembrane sequences typically does 
not significantly alter the conformation of the remaining protein structure. 

Moreover, transmembrane sequences, being by definition embedded within a membrane, are 
inaccessible. Therefore, antibodies to these sequences will not prove useful for in vivo or in situ studies. 
Deletion of transmembrane encoding sequences from the genes used for expression can be achieved by 
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standard techniques. For example, fortuitously-placed restriction enzyme sites can be used to excise the 
desired gene fragment, or PCR-type amplification can be used to amplify only the desired part of the gene. 
The skilled practitioner will realize that such changes must be designed so as not to change the translational 
reading frame for downstream portions of the protein-encoding sequence. 

In one embodiment, computer sequence analysis is used to determine the location of the predicted 
major antigenic determinant epitopes of the polypeptide. Software capable of carrying out this analysis 
readily available commercially, for example MacVector (IBI, New Haven, CT). The software typically 
standard algorithms such as the Kyte/Doolittle or Hopp/Woods methods for locating hydrophilic sequences 
which are characteristically found on the surface of proteins and are, therefore, likely to act as antigenic 
determinants. 

Once this analysis is made, polypeptides can be prepared that contain at least the essential features 
of the antigenic determinant and that can be employed in the generation of antisera against the polypeptide. 
Minigenes or gene fusions encoding these determinants can be constructed and inserted into expression 
vectors by standard methods, for example, using PCR methodology. 

The gene or gene fragment encoding a polypeptide can be inserted into an expression vector by 
standard subcloning techniques, in one embodiment, an £ caff expression vector is used that produces the 
recombinant polypeptide as a fusion protein, allowing rapid affinity purification of the protein. Examples of 
such fusion protein expression systems are the glutathione ^-transferase system (Pharmacia, Piscataway, 
NJ), the maltose binding protein system (NEB, Beverley, MA), the FLAG system (IBI, New Haven, CT), and 
the BxHis system (Qiagen, Chatsworth, CA). 

Some of these systems produce recombinant polypeptides bearing only a small number of additional 
amino acids, which are unlikely to affect the antigenic ability of the recombinant polypeptide. For example, 
both the FLAG system and the BxHis system add only short sequences, both of that are known to be poorly 
antigenic and which do not adversely affect folding of the polypeptide to its native conformation. Other 
fusion systems produce polypeptide where it is desirable to excise the fusion partner from the desired 
polypeptide. In one embodiment, the fusion partner is linked to the recombinant polypeptide by a peptide 
sequence containing a specific recognition sequence for a protease. Examples of suitable sequences are 
those recognized by the Tobacco Etch Virus protease (Life Technologies, Gaithersburg, MO) or Factor Xa 
(New England Biolabs, Beverley, MA). 
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Recombinant bacterial cells, for example £ coK. are grown in any of a number of suitable media, for 
example LB, and the expression of the recombinant polypeptide induced by adding IPTG to the media or 
switching incubation to a higher temperature. After culturing the bacteria for a further period of between 2 
and 24 hours, the cells are collected by centrifugationand washed to remove residual media. The bacterial 
cells are then lysed, for example, by disruption in a cell homogenizer and centrifuged to separate the dense 
inclusion bodies and cell membranes from the soluble cell components. This centrif ugation can be performed 
under conditions whereby the dense inclusion bodies are selectively enriched by incorporation of sugars such 
as sucrose into the buffer and centrif ugation at a selective speed. 

In another embodiment, the expression system used is one driven by the baculovirus polyhedron 
promoter. The gene encoding the polypeptide can be manipulated by standard techniques in order to 
facilitate cloning into the baculovirus vector. One baculovirus vector is the pBlueBac vector {Invitrogen, 
Sorrento. CA). The vector carrying the gene for the polypeptide is transfected into Spodoptera frugiperda 
(Sf9) cells by standard protocols, and the cells are cultured and processed to produce the recombinant 
antigen. See Summers eta/.. A MANUAL OF METHODS FOR BACULOVIRUS VECTORS AND INSECT CELL 
CULTURE PROCEDURES, Texas Agricultural ExperimentalStation. 

As an alternative to recombinant polypeptides, synthetic peptides corresponding to the antigenic 
determinants can be prepared. Such peptides are at least six amino acid residues long, and may contain up 
to approximately 35 residues, which is the approximate upper length limit of automated peptide synthesis 
machines, such as those available from Applied Biosystems (Foster City, C A). Use of such small peptides for 
vaccination typically requires conjugation of the peptide to an immunogenic carrier protein such as hepatitis 
B surface antigen, keyhole limpet hemocyanin or bovine serum albumin. Methods for performing this 
conjugation are well known in the art. 

In one embodiment, amino acid sequence variants of the polypeptide can be prepared. These may. 
for instance, be minor sequence variants of the polypeptide that arise due to natural variation within the 
population or they may be homologues found in other species. They also may be sequences that do not 
occur naturally but that are sufficiently similar that they function similarly and/or elicit an immune response 
that cross-reacts with natural forms of the polypeptide. Sequence variants can be prepared by standard 
methods of site directed mutagenesis such as those described below in the following section. 
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Amino acid sequence variants of the polypeptide can be substitutional, insertional or deletion 
variants. Deletion variants lack one or more residues of the native protein which are not essential for 
function or immunogenic activity, and are exemplified by the variants lacking a transmembrane sequence 
described above. Another common type of deletion variant is one lacking secretory signal sequences or 
signal sequences directing a protein to bind to a particular part of a cell. An example of the latter sequence 
is the SH2 domain, which induces protein binding to phosphotyrosine residues. 

Substitutional variants typically contain the exchange of one amino acid for another at one or more 
sites within the protein, and may be designed to modulate one or more properties of the polypeptide such as 
stability against proteolytic cleavage. Substitutions preferably are conservative, that is. one amino acid is 
replaced with one of similar shape and charge. Conservative substitutions are well known in the art and 
include, for example, the changes of: alanine to serine; arginine to lysine; asparagine to glutamine or 
histidine; aspartate to glutamate; cysteine to serine; glutamine to asparagine; glutamate to aspartate; 
glycine to proline; histidine to asparagine or glutamine; isoleucine to leucine or valine; leucine to valine or 
isoleucine; lysine to arginine; methionine to leucine or isoleucine; phenylalanine to tyrosine, leucine or 
methionine; serine to threonine; threonine to serine; tryptophan to tyrosine; tyrosine to tryptophan or 
phenylalanine; and valine to isoleucine or leucine. 

Insertional variants include fusion proteins such as those used to allow rapid purification of the 
polypeptide and also can include hybrid proteins containing sequences from other proteins and polypeptides 
which are homologues of the polypeptide. For example, an insertional variant could include portions of the 
amino acid sequence of the polypeptide from one species, together with portions of the homologous 
polypeptide from another species. Other insertional variants can include those in which additional amino 
acids are introduced within the coding sequence of the polypeptide. These typically are smaller insertions 
than the fusion proteins described above and are introduced, for example, into a protease cleavage site. 

In one embodiment, major antigenic determinants of the polypeptide are identified by an empirical 
approach in which portions of the gene encoding the polypeptide are expressed in a recombinant host, and 
the resulting proteins tested for their ability to elicit an immune response. For example, PCR can be used to 
prepare a range of cDNAs encoding peptides lacking successively longer fragments of the C terminus of the 
protein. The immunoprotective activity of each of these peptides then identifies those fragments or domains 
of the polypeptide that are essential for this activity. Further experiments in which only a small number of 
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amino acids are removed at each iteration then allows the location of the antigenic determinants of the 
polypeptide. 

Another embodiment for the preparation of the polypeptides according to the invention is the use of 
peptide mimetics. Mimetics are peptide-containing molecules that mimic elements of protein secondary 
structure. See. for example, Johnson at ^//Peptide Turn Mimetics" in BIOTECHNOLOGY AND PHARMACY, 
Pezzutoer a/., Eds., Chapman and Hall, New York (1 993). The underlying rationale behind the use of peptide 
mimetics is that the peptide backbone of proteins exists chiefly to orient amino acid side chains in such a 
way as to facilitate molecular interactions, such as those of antibody and antigen. A peptide mimetic is 
expected to permit molecular interactions similar to the natural molecule. 

Successful applications of the peptide mimetic concept have thus far focused on mimetics of p- 
turns within proteins, which are known to be highly antigenic. Likely p-turn structure within an polypeptide 
can be predicted by computer-based algorithms as discussed above. Once the component amino acids of the 
turn are determined, peptide mimetics can be constructed to achieve a similar spatial orientation of the 
essential elements of the amino acid side chains. 

Modification and changes may be made in the structure of a gene and still obtain a functional 
molecule that encodes a protein or polypeptide with desirable characteristics. The following is a discussion 
based upon changing the amino acids of a protein to create an equivalent, or even an improved, second- 
generation molecule. The amino acid changes may be achieved by changing the codons of the DNA 
sequence, according to the following data. 

For example, certain amino acids may be substituted for other amino acids in a protein structure 
without appreciable loss of interactive binding capacity with structures such as, for example, antigen-binding 
regions of antibodies or binding sites on substrate molecules. Since it is the interactive capacity and nature 
of a protein that defines that protein's biological functional activity, certain amino acid substitutions can be 
made in a protein sequence, and its underlying ONA coding sequence, and nevertheless obtain a protein with 
like properties. It is thus contemplated by the inventors that various changes may be made in the DNA 
sequences of genes without appreciable loss of their biological utility or activity. 

in making such changes, the hydropathic index of amino acids may be considered. The importance of 
the hydropathic amino acid index in conferring interactive biologic function on a protein is generally 
understood in the art (Kyte & Doolittle, 1982). 
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TABLE 3 



Amino Acids 


Cottons 


Alanine 


Ala 


A 


GCA 


GCC 


GCG 


GCU 






Cysteine 


Cys 


C 


UGC 


UGU 








Aspartic acid 


Asp 


D 


GAC 


GAU 










Glutamic acid 


Glu 


E 


GAA 


GAG 










Phenylalanine 


Phe 


F 


UUC 


UUU 










Glycine 


Gly 


G 


GGA 


GGC 


GGG 


GGU 






Histidine 


His 


H 


CAC 


CAU 








Isoleucine 


lie 


1 


AUA 


AUC 


AUU 








Lysine 


Lys 


K 


AAA 


AAG 










Leucine 


Leu 


L 


UUA 


UUG 


CUA 


cue 


CUG 


CUU 


Methionine 


Met 


M 


AUG 








Asparagine 


Asn 


N 


AAC 


AAU 










Proline 


Pro 


P 


CCA 


CCC 


CCG 


ecu 






Glutamine 


Gin 


Q 


CAA 


CAG 








Arginine 


Arg 


R 


AGA 


AGG 


CGA 


CGC 


CGG 


CGU 


Serine 


Ser 


S 


AGCAGU 


UCA 


UCC 


UCG 


UCU 


Threonine 


Thr 


T 


ACA 


ACC 


ACG 


ACU 


Valine 


Val 


V 


GUA 


GUC 


GUG 


GUU 






Tryptophan 


Trp 


w 


UGG 










Tyrosine 


Tyr 


Y 


UAC 


UAU 
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It is accepted that the relative hydropathic character of the amino acid contributes to the 
secondary structure of the resultant protein, which in turn defines the interaction of the protein with 
other molecules, for example, enzymes, substrates, receptors, DNA. antibodies, antigens, and the like. 

Each amino acid has been assigned a hydropathic index on the basis of their hydrophobicity and 
charge characteristics (Kyte & Doolittle, 1982), these are: Isoleucine (--4.5); valine (+4.2); leucine 
(+3.8); phenylalanine (+2.8); cysteine/cystine (+2.5); methionine (+1.9); alanine (+1.8); glycine ( 0.4); 
threonine (-0.7); serine (-0.8); tryptophan (-0.9); tyrosine (-1.3); proline (-1.6); histidine (-3.2); glutamate ( 
3.5); glutamine (-3.5); aspartate (-3.5); asparagine (-3.5); lysine (-3.9); and arginine (-4.5). 

It is known in the art that certain amino acids may be substituted by other amino acids having a 
similar hydropathic index or score and still result in a protein with similar biological activity, i.e.. still 
obtain a biological functionally equivalent protein. In making such changes, the substitution of amino 
acids whose hydropathic indices are within ±2 is preferred, those which are within +1 are particularly 
preferred, and those within ± 0.5 are even more particularly preferred. 

It is also understood in the art that the substitution of like amino acids can be made effectively 
on the basis of hydrophilicity. U.S. Patent 4,554,101, incorporated herein by reference, states that the 
greatest local average hydrophilicity of a protein, as governed by the hydrophilicity of its adjacent amino 
acids, correlates with a biological property of the protein. 

As detailed in U.S. Patent 4,554,101, the following hydrophilicity values have been assigned to 
amino acid residues: arginine (+3.0); lysine (+3.0); aspartate (+3.0 ± 1); glutamate (+3.0 + 1); serine 
(+0.3); asparagine (+0.2); glutamine (+0.2); glycine (0); threonine (-0.4); proline ( 0.5 ± 1); alanine 
( 0.5); histidine -0.5); cysteine (-1.0); methionine (-1.3); valine (-1.5); leucine (-1.8); isoleucine (-1.8); 
tyrosine (-2.3); phenylalanine (-2.5); tryptophan (-3.4). 

It is understood that an amino acid can be substituted for another having a similar hydrophilicity 
value and still obtain a biologically equivalent and immunologically equivalent protein. In such changes, the 
substitution of amino acids whose hydrophilicity values are within ±2 is preferred, those that are within 
± 1 are particularly preferred, and those within ±0.5 are even more particularly preferred. 

As outlined above, amino acid substitutions are generally based on the relative similarity of the 
amino acid side-chain substituents, for example, their hydrophobicity, hydrophilicity, charge, size, and the 
like. Exemplary substitutions that take various of the foregoing characteristics into consideration are 
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well known to those of skill in the art and include: arginine and lysine; glutamate and aspartate; serine 
and threonine; glutamine and asparagine; and valine, leucine and isoleucine. 
G. Site-Specific Mutagenesis 

Site-specific mutagenesis is a technique useful in the preparation of individual peptides, or 
biologically functional equivalent proteins or peptides, through specific mutagenesis of the underlying 
DNA. The technique further provides a ready ability to prepare and test sequence variants, incorporating 
one or more of the foregoing considerations, by introducing one or more nucleotide sequence changes into 
the DNA. Site-specific mutagenesis allows the production of mutants through the use of specific 
oligonucleotide sequences which encode the DNA sequence of the desired mutation, as well as a 
sufficient number of adjacent nucleotides, to provide a primer sequence of sufficient size and sequence 
complexity to form a stable duplex on both sides of the deletion Junction being traversed. Typically, a 
primer of about 17 to 25 nucleotides in length is preferred, with about 5 to 10 residues on both sides of 
the junction of the sequence being altered. 

In general, the technique of site-specific mutagenesis is well known in the art. As will be 
appreciated, the technique typically employs a bacteriophage vector that exists in both a single stranded 
and double stranded form. Typical vectors useful in site-directed mutagenesis include vectors such as the 
M13 phage. These phage vectors are commercially available and their use is generally well known to 
those skilled in the art. Double stranded piasmids are also routinely employed in site directed 
mutagenesis, which eliminates the step of transferring the gene of interest from a phage to a plasmid. 

In general, site-directed mutagenesis is performed by first obtaining a single-stranded vector, or 
melting of two strands of a double stranded vector which includes within its sequence a DNA sequence 
encoding the desired protein. An oligonucleotide primer bearing the desired mutated sequence is 
synthetically prepared. This primer is then annealed with the single-stranded DNA preparation, and 
subjected to DNA polymerizing enzymes such as £ coff polymerase I Klenow fragment, in order to 
complete the synthesis of the mutation-bearing strand. Thus, a heteroduplex is formed wherein one 
strand encodes the original non-mutated sequence and the second strand bears the desired mutation. 
This heteroduplex vector is then used to transform appropriate cells, such as £ caff cells, and clones are 
selected that include recombinant vectors bearing the mutated sequence arrangement. 
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The preparation of sequence variants of the selected gene using site-directed mutagenesis is 
provided as a means of producing potentially useful species and is not meant to be limiting, as there are 
other ways in which sequence variants of genes may be obtained. For example, recombinant vectors 
encoding the desired gene may be treated with mutagenic agents, such as hydroxylamine, to obtain 
5 sequence variants. 

H. Expression and Purification of Encoded Proteins 
/. Expression of Proteins from Cloned cDNAs 

The cDNA species specified in SEQ ID N0:1, SEQ ID N0:3, SEQ ID N0:5, SEQ ID N0:7, and 
HNFIa can be expressed as encoded peptides or proteins. In other embodiments cDMA species specified 

10 in SEQ ID N0:78, SEQ ID N0:34, SEQ ID N0:36, SEQ ID N0:38, SEQ ID N0:40, SEQ ID N0:42, SEQ ID 
N0:44, SEQ ID N0:46, SEQ ID N0:48. SEQ ID N0:50, SEO ID N0:52, SEQ ID N0:54. and HNF4a can be 
expressed as encoded peptides or proteins. The DNA species specified in SEQ ID N0:128 and HNFIp can 
be expressed as encoded peptides or proteins. The engineering of DNA segment(s) for expression in a 
prokaryotic or eukaryotic system may be performed by techniques generally known to those of skill in 

15 recombinant expression. It is believed that virtually any expression system may be employed in the 
expression of the claimed nucleic acid sequences. 

Both cDNA and genomic sequences are suitable for eukaryotic expression, as the host cell will 
generally process the genomic transcripts to yield functional mRNA for translation into protein. Generally 
speaking, it may be more convenient to employ as the recombinant gene a cDNA version of the gene. It is 

20 believed that the use of a cDNA version will provide advantages in that the size of the gene will generally 
be much smaller and more readily employed to transfect the targeted cell than will a genomic gene, which 
will typically be up to an order of magnitude larger than the cDNA gene. However, the inventor does not 
exclude the possibility of employing a genomic version of a particular gene where desired. 

As used herein, the terms "engineered" and "recombinant" cells are intended to refer to a cell into 

25 which an exogenous DNA segment or gene, such as a cDNA or gene has been introduced. Therefore, 
engineered cells are distinguishable from naturally occurring cells which do not contain a recombinantly 
introduced exogenous DNA segment or gene. Engineered cells are thus cells having a gene or genes 
introduced through the hand of man. Recombinant cells include those having an introduced cDNA or 
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genomic DNA, and also include genes positioned adjacent to a promoter not naturally associated with the 
particular introduced gene. 

To express a recombinant encoded protein or peptide, whether mutant or wild-type, in accordance 
with the present invention one would prepare an expression vector that comprises one of the claimed 
isolated nucleic acids under the control of one or more promoters. To bring a coding sequence "under the 
control of" a promoter, one positions the 5' end of the translation^ initiation site of the reading frame 
generally between about 1 and 50 nucleotides "downstream" of \j.e., 3' of) the chosen promoter. The 
"upstream" promoter stimulates transcription of the inserted DNA and promotes expression of the 
encoded recombinant protein. This is the meaning of "recombinant expression" in the context used here. 

Many standard techniques are available to construct expression vectors containing the 
appropriate nucleic acids and transcriptional/translational control sequences in order to achieve protein or 
peptide expression in a variety of host-expression systems. Cell types available for expression include, 
but are not limited to, bacteria, such as £ wtfand B. subti/is transformed with recombinant phage DNA, 
plasmid DNA or cosmid DNA expression vectors. 

Certain examples of prokaryotic hosts are £ coli strain RR1, £ coli LE392, £ coffB. £ coff x 
1776 <ATCC No. 31537) as well as£ «///W3110 (F-, lambda-, prototrophic, ATCC No. 273325); bacilli 
such as Bacillus subtilis; and other enterobacteriaceae such as Salmonella typhimurium, Serratia 
marcescens, and various Pseudomonas species. 

In general, plasmid vectors containing replicon and control sequences that are derived from 
species compatible with the host cell are used in connection with these hosts. The vector ordinarily 
carries a replication site, as well as marking sequences that are capable of providing phenotypic selection 
in transformed cells. For example, £ coli is often transformed using pBR322, a plasmid derived from an 
£ coli species. Plasmid pBR322 contains genes for ampicillin and tetracycline resistance and thus 
provides easy means for identifying transformed cells. The pBR322 plasmid, or other microbial plasmid or 
Phage must also contain, or be modified to contain, promoters that can be used by the microbial organism 
for expression of its own proteins. 

In addition, phage vectors containing replicon and control sequences that are compatible with the 
host microorganism can be used as transforming vectors in connection with these hosts. For example, 
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the phage lambda GEM™-11 may be utilized in making a recombinant phage vector that can be used to 
transform host cells, such as f. caff LE392. 

Further useful vectors include pIN vectors (Inouye et aL. 1985); and pGEX vectors, for use in 
generating glutathione transferase (GST) soluble fusion proteins for later purification and separation or 
cleavage. Other suitable fusion proteins are those with B-galactosidase, ubiquitin. or the like. 

Promoters that are most commonly used in recombinant DNA construction include the B- 
lactamase (penicillinase), lactose and tryptophan (trp) promoter systems. While these are the most 
commonly used, other microbial promoters have been discovered and utilized, and details concerning their 
nucleotide sequences have been published, enabling those of skill in the art to ligate them functionally 
with plasmid vectors. 

For expression in Saccharomyces, the plasmid YRp7. for example, is commonly used (Stinchcomb 
etai. 1979; Kingsman etaL. 1979; Tschemper eta/.. 1980). This plasmid contains the trp\ gene, which 
provides a selection marker for a mutant strain of yeast lacking the ability to grow in tryptophan, for 
example ATCC No. 44076 or PEP4-1 (Jones, 1977). The presence of the trfl lesion as a characteristic of 
the yeast host cell genome then provides an effective environment for detecting transformation by 

growth in the absence of tryptophan. 

Suitable promoting sequences in yeast vectors include the promoters for 3-phosphoglycerate 
kinase (Hitzeman et aL 1980) or other glycolytic enzymes (Hess et at.. 1968; Holland et aL 1978), such 
as enolase. glyceraldehyde-3-phosphate dehydrogenase, hexokinase, pyruvate decarboxylase, 
phosphofructokinase, glucose-6-phosphate isomerase, 3-phosphoglycerate mutase, pyruvate kinase, 
triosephosphate isomerase, phosphoglucose isomerase, and glucokinase. In constructing suitable 
expression plasmids, the termination sequences associated with these genes are also ligated into the 
expression vector 3' of the sequence desired to be expressed to provide polyadenylation of the mRN A and 
termination. 

Other suitable promoters, which have the additional advantage of transcription controlled by 
growth conditions, include the promoter region for alcohol dehydrogenase 2, isocytochrome C. acid 
phosphatase, degradative enzymes associated with nitrogen metabolism, and the aforementioned 
glyceraldehyde-3-phosphate dehydrogenase, and enzymes responsible for maltose and galactose 
utilization. 
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In addition tit nxcm-organiama. CU | tllres of ^ „„„, mu(tjM||u(ar ma(r ta 

used .s hosts. In principle. any such cell culture i S wnrkable. whether frem vertebrate er in»ertebret. 
cel«.,e. in addition ,„ mamma„ an a*,. ttme ind ^ ^ ^ ^ ^ 

express™ «Co, s tey .. bairns); Md p|an , CB|| ^ infac , ed wj(h reMmb . nant ^ 

vectors te*, cauliflower mosaic , irus , C ,MV; tobacco mneaic ,i ras . TMVI e, Conned with 

r«omb»ant ptasmid expression vectors Ti plesmid) containing „„, „ „„„ codjn(| MquM1CK 

In a useful insect system, /teoj^n ca«»wa nuclaer polyhidrosis .ires IAcNPVI is used as a 
vector .. express foreign genes. The virus g,.„s in SpmhfUn fmtipBrta ^ ^ ^ ^ 
acd codrng sequences are cloned into non-eeaentia. re ,i„„ s ,,„, example the ^ „ ^ >jnjs 

and placed node, control of an AcNPV promote, ,fo, example. tll e po, yhe dr.„ promoter), successful 
msertion of the coding sauces results in .he inaction ef the pnlynedmn gene and production of no„. 
•corded recrnnbinan, virus i/.e.. .una lacking ,be proteinases coa, coded f„, by the poiybedron gene) 
These recombinant .buses are than used ,. infec, Sfo d m „ a frllgiperda £e||s m ^ |||e ^ 
is expressed (e.y„ U.S. Patent No. 4,215.051). 

Examples of uaafol mamnmlian host cell tome are VERO and HeLa calls. Chinese hamate, ovary 
(CHO) cab lines. W.38. BHK, COS-7. 293. H.pG2. N.H3T3. RIN and MDCK cell lines. .„ addition, a hex, 
call may be chosen that modulates the expression of the inserted sequences, o, modifies end processes 
the gene product in the apacific fashion desired. Srmh modifications In.,., glycosymtion) ,„d processing 
(a*, cleavage) of protein products may be important for the taction of the encoded protein. 

Different boat cells have characteristic and spmnfic mechanisms for the post-translarionel 
processing and modmcation of proteina. Appropriate all lines or host cystoma con be chosen to ensure 
the correct modification and prncossing of the foreign protein expressed. Express** vectors fo, use i„ 
mammalian cote ordinarily include an origin of replicatim, fee necesaa^l. a promote, located in front of 
the gene to be expressed, along with any necessary ribosome binding sites. RNA splice sites 
polyadenylation site, and tranacriptiun.) terminator aequonces. The origin of replication mey bo provided 
enher by construction of the vector to include an exogenous origin, such as may be derived from SV40 or 
•the, viral tejf .. Polyoma, Adeno, VSV. BPV. source, or may be provided by the boat cell chromosomal 
tephcatron mechanism. If the vecto, is integrated «,. the boat cell ohmnrnaome. ,he lam, is ^ 
sufficient. 
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The promoters may be derived from the genome of mammalian cells (e.g.. metallothionein 
promoter) or from mammalian viruses (e.g.. the adenovirus late promoter; the vaccinia virus 7.5K 
promoter). Further, it is also possible, and may be desirable, to utilize promoter or control sequences 
normally associated with the desired gene sequence, provided such control sequences are compatible with 
the host cell systems. 

A number of viral based expression systems may be utilized, for example, commonly used 
promoters are derived from polyoma, Adenovirus 2, cytomegalovirus and Simian Virus 40 (SV40). The 
early and late promoters of SV40 virus are useful because both are obtained easily from the virus as a 
fragment which also contains the SV40 viral origin of replication. Smaller or larger SV40 fragments may 
also be used, provided there is included the approximately 250 bp sequence extending from the #M)III 
site toward the Bgh site located in the viral origin of replication. 

In cases where art adenovirus is used as an expression vector, the coding sequences may be 
ligated to an adenovirus transcription/translation control complex, e.g.. the late promoter and tripartite 
leader sequence. This chimeric gene may then be inserted in the adenovirus genome by in vitro or in vivo 
recombination. Insertion in a non-essential region of the viral genome \e.g.. region E1 or E3) will result in 
a recombinant virus that is viable and capable of expressing proteins in infected hosts. 

Specific initiation signals may also be required for efficient translation of the claimed isolated 
nucleic acid coding sequences. These signals include the ATG initiation codon and adjacent sequences. 
Exogenous translational control signals, including the ATG initiation codon, may additionally need to be 
provided. One of ordinary skill in the art would readily be capable of determining this need and providing 
the necessary signals. It is well known that the initiation codon must be in-frame (or in-phase) with the 
reading frame of the desired coding sequence to ensure translation of the entire insert. These exogenous 
translational control signals and initiation codons can be of a variety of origins, both natural and 
synthetic. The efficiency of expression may be enhanced by the inclusion of appropriate transcription 
enhancer elements or transcription terminators (Bittner et at., 1 987). 

In eukaryotic expression, one will also typically desire to incorporate into the transcriptional unit 
an appropriate polyadenylation site [e.g., 5'-AATAAA-3') if one was not contained within the original 
cloned segment. Typically, the poly A addition site is placed about 30 to 2000 nucleotides "downstream" 
of the termination site of the protein at a position prior to transcription termination. 
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For long-term, high-yield production of recombinant proteins, stable expression is preferred. For 
example, cell lines that stably express constructs encoding proteins may be engineered. Rather than 
using expression vectors that contain viral origins of replication, host cells can be transformed with 
vectors controlled by appropriate expression control elements \e.g., promoter, enhancer, sequences, 
transcription terminators, polyadenylation sites, etc.). and a selectable marker. Following the introduction 
of foreign DNA, engineered cells may be allowed to grow for 1-2 days in an enriched medium, and then 
are switched to a selective medium. The selectable marker in the recombinant plasmid confers resistance 
to the selection and allows cells to stably integrate the plasmid into their chromosomes and grow to form 
foci, which in turn can be cloned and expanded into cell lines. 

A number of selection systems may be used, including, but not limited, to the herpes simplex virus 
thymidine kinase (Wigler et a/.. 1977), hypoxanthineguanine phosphoribosyltransferase (Szybalska et a/., 
1962) and adenine phosphoribosyltransferase genes (Lowy et a/.. 1980), in tk, hgprt or aprf cells, 
respectively. Also, antimetabolite resistance can be used as the basis of selection for dhfr. which confers 
resistance to methotrexate (Wigler et ai. 1980; O'Hare et at.. 1981); gpt. which confers resistance to 
mycophenolic acid (Mulligan eta/., 1981); oeo. which confers resistance to the aminoglycoside G-418 
(Colberre-Garapin eta/., 1981); zndhygro, which confers resistance to hygromycin. 

It is contemplated that the isolated nucleic acids of the invention may be "overexpressed", i.e.. 
expressed in increased levels relative to its natural expression in human cells, or even relative to the 
expression of other proteins in the recombinant host cell. Such overexpression may be assessed by a 
variety of methods, including radio-labeling and/or protein purification. However, simple and direct 
methods are preferred, for example, those involving SDS/PAGE and protein staining or western blotting, 
followed by quantitative analyses, such as densitometry scanning of the resultant gel or blot. A specific 
increase in the level of the recombinant protein or peptide in comparison to the level in natural human 
cells is indicative of overexpression, as is a relative abundance of the specific protein in relation to the 
other proteins produced by the host cell and, e.g.. visible on a gel. 
2. Purification o f Expressed Proteins 

Further aspects of the present invention concern the purification, and in particular embodiments, 
the substantial purification, of an encoded protein or peptide. The term "purified protein or peptide " as 
used herein, is intended to refer to a composition, isolatable from other components, wherein the protein 
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or peptide is purified to any degree relative to its naturally-obtainable state, Le.. in this case, relative to 
its purity within a hepatocyte or p-cell extract. A purified protein or peptide therefore also refers to a 
protein or peptide, free from the environment in which it may naturally occur. 

Generally, "purified" will refer to a protein or peptide composition that has been subjected to 
fractionation to remove various other components, and which composition substantially retains its 
expressed biological activity. Where the term "substantially purified" is used, this designation will refer 
to a composition in which the protein or peptide forms the major component of the composition, such as 
constituting about 50% or more of the proteins in the composition. 

Various methods for quantifying the degree of purification of the protein or peptide will be known 
to those of skill in the art in light of the present disclosure. These include, for example, determining the 
specific activity of an active fraction, or assessing the number of polypeptides within a fraction by 
SDS/PAGE analysis. A preferred method for assessing the purity of a fraction is to calculate the specific 
activity of the fraction, to compare it to the specific activity of the initial extract, and to thus calculate 
the degree of purity, herein assessed by a " fold purification number". The actual units used to represent 
the amount of activity will, of course, be dependent upon the particular assay technique chosen to follow 
the purification and whether or not the expressed protein or peptide exhibits a detectable activity. 

Various techniques suitable for use in protein purification will be well known to those of skill in 
the art. These include, for example, precipitation with ammonium sulphate, polyethylene glycol, 
antibodies and the like or by heat denaturation, followed by centrifugation; chromatography steps such as 
ion exchange, gel filtration, reverse phase, hydroxylapatite and affinity chromatography; isoelectric 
focusing; gel electrophoresis; and combinations of such and other techniques. As is generally known in 
the art, it is believed that the order of conducting the various purification steps may be changed, or that 
certain steps may be omitted, and still result in a suitable method for the preparation of a substantially 

purified protein or peptide. 

There is no general requirement that the protein or peptide always be provided in their most 
purified state. Indeed, it is contemplated that less substantially purified products will have utility in 
certain embodiments. Partial purification may be accomplished by using fewer purification steps in 
combination, or by utilizing different forms of the same general purification scheme. For example, it is 
appreciated that a cation-exchange column chromatography performed utilizing an HPLC apparatus will 
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generally result in a greater -fold purification than the same technique utilizing a low pressure 
chromatography system. Methods exhibiting a lower degree of relative purification may have advantages 
in total recovery of protein product, or in maintaining the activity of an expressed protein. 

It is known that the migration of a polypeptide can vary, sometimes significantly, with different 
conditions of SDS/PAGE (Capaldi eta/., Biochrn. Biophys. Res. Comm., 7ft425, 1977). It will therefore 
be appreciated that under differing electrophoresis conditions, the apparent molecular weights of purified 
or partially purified expression products may vary. 

I. Preparation of Antibodies Specific for Encoded Proteins 

Antibody Generation 

For some embodiments, it will be desired to produce antibodies that bind with high specificity to 
the protein productls) of an isolated nucleic acid selected from the group comprising SEQ ID N0:1, SEQ ID 
N0:3, SEQ ID N0:5, SEQ ID N0:7 or any other mutant of HNFIct, SEQ ID N0:78, SEQ ID N0:34, SEQ ID 
N0:36, SEQ ID NO:38, SEQ ID N0:40, SEQ ID I\I0:42, SEQ ID N0:44, SEQ ID N0:46, SEQ ID N0:48, SEQ ID 
N0:50, SEQ ID N0:52, SEQ ID NO:54, or any other mutant of HNF4a, SEQ ID N0:128 IHNF1P) or any 
mutant of HNF1 p. Means for preparing and characterizing antibodies are well known in the art {See, e.g. t 
Antibodies. A Laboratory Manual, Cold Spring Harbor Laboratory, 1988; incorporated herein by 
reference). 

Methods for generating polyclonal antibodies are well known in the art. Briefly, a polyclonal 
antibody is prepared by immunizing an animal with an antigenic composition and collecting antisera from 
that immunized animal. A wide range of animal species can be used for the production of antisera. 
Typically the animal used for production of antisera is a rabbit, a mouse, a rat, a hamster, a guinea pig or 
a goat. Because of the relatively large blood volume of rabbits, a rabbit is a preferred choice for 
production of polyclonal antibodies. 

As is well known in the art, a given composition may vary in its immunogenicity. It is often 
necessary therefore to boost the host immune system, as may be achieved by coupling a peptide or 
polypeptide immunogen to a carrier. Exemplary and preferred carriers are keyhole limpet hemocyanin 
(KLH) and bovine serum albumin (BSA). Other albumins such as ovalbumin, mouse serum albumin or rabbit 
serum albumin can also be used as carriers. Means for conjugating a polypeptide to a carrier protein are 
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well known in the art and include glutaraldehyde, m-maleimidobenzoyl-N-hydroxysuccinimide ester, 
carbodiimide and bisbiazotized benzidine. 

As is also well known in the art, the immunogenicity of a particular immunogen composition can 
be enhanced by the use of non-specific stimulators of the immune response, known as adjuvants. 
5 Exemplary and preferred adjuvants include complete Freund's adjuvant (a non specific stimulator of the 
immune response containing killed Mycobacterium tuberculosis), incomplete Freund's adjuvants and 

aluminum hydroxide adjuvant. 

The amount of immunogen composition used in the production of polyclonal antibodies varies 
upon the nature of the immunogen as well as the animal used for immunization. A variety of routes can 

10 be used to administer the immunogen (subcutaneous, intramuscular, intradermal, intravenous and 
intraperitoneal). The production of polyclonal antibodies may be monitored by sampling blood of the 
immunized animal at various points following immunization. A second, booster injection, may also be 
given. The process of boosting and titering is repeated until a suitable titer is achieved. When a desired 
level of immunogenicity is obtained, the immunized animal can be bled and the serum isolated and stored, 

15 andlor in some cases the animal can be used to generate MAbs. For production of rabbit polyclonal 
antibodies, the animal can be bled through an ear vein or alternatively by cardiac puncture. The removed 
blood is allowed to coagulate and then centrif uged to separate serum components from whole cells and 
blood clots. The serum may be used as is for various applications or the desired antibody fraction may be 
purified by well-known methods, such as affinity chromatography using another antibody or a peptide 

20 bound to a solid matrix. 

Monoclonal antibodies (MAbs) may be readily prepared through use of well-known techniques, 
such as those exemplified in U.S. Patent 4,196,265. incorporated herein by reference. Typically, this 
technique involves immunizing a suitable animal with a selected immunogen composition, e.g.. a purified 
or partially purified expressed protein, polypeptide or peptide. The immunizing composition is 
25 administered in a manner that effectively stimulates antibody producing cells. 

The methods for generating monoclonal antibodies (MAbs) generally begin along the same lines as 
those for preparing polyclonal antibodies. Rodents such as mice and rats are preferred animals, however, 
the use of rabbit, sheep or frog cells is also possible. The use of rats may provide certain advantages 
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(Coding, 1986, pp. 60-61). but mice are preferred, with the BALB/c mouse being most preferred as this is 
most routinely used and generally gives a higher percentage of stable fusions. 

The animals are injected with antigen as described above. The antigen may be coupled to carrier 
molecules such as keyhole limpet hemocyanin if necessary. The antigen would typically be mixed with 
adjuvant, such as Freund's complete or incomplete adjuvant. Booster injections with the same antigen 
would occur at approximately two-week intervals. 

Following immunization, somatic cells with the potential for producing antibodies, specifically B 
lymphocytes (B cells), are selected for use in the MAb generating protocol. These cells may be obtained 
from biopsied spleens, tonsils or lymph nodes, or from a peripheral blood sample. Spleen cells and 
peripheral blood cells are preferred, the former because they are a rich source of antibody-producing cells 
that are in the dividing plasmablast stage, and the latter because peripheral blood is easily accessible. 
Often, a panel of animals will have been immunized and the spleen of animal with the highest antibody 
titer will be removed and the spleen lymphocytes obtained by homogenizing the spleen with a syringe. 
Typically, a spleen from an immunized mouse contains approximately 5 X 10 7 to 2 X 10 8 lymphocytes. 

The antibody-producing B lymphocytes from the immunized animal are then fused with cells of an 
immortal myeloma cell, generally one of the same species as the animal that was immunized. Myeloma 
cell lines suited for use in hybridoma-producing fusion procedures preferably are non-antibody-producing, 
have high fusion efficiency, and have enzyme deficiencies that render them incapable of growing in 
certain selective media that support the growth of only the desired fused cells (hybridomas). 

Any one of a number of myeloma cells may be used, as are known to those of skill in the art 
(Goding, pp. 65-66. 1986; Campbell, pp. 75-83, 1984). For example, where the immunized animal is a 
mouse, one may use P3-X63/Ag8, X63-Ag8.653. NS1/1.Ag 4 1. Sp210-Ag14, FO, NSO/U. MPC-11. 
MPC11-X45-GTG 1.7 and S194/5XX0 Bui; for rats, one may use R210.RCY3, Y3-Ag 1.2.3, IR983F and 
4B210; and U-266. GM1500-GRG2, LICR-L0N-HMy2 and UC729-6 are all useful in connection with 
human cell fusions. 

One preferred murine myeloma cell is the NS-1 myeloma cell line (also termed P3-NS-1-Ag4.1), 
which is readily available from the NIGMS Human Genetic Mutant Cell Repository by requesting cell line 
repository number GM3573. Another mouse myeloma cell line that may be used is the 
8-azaguanine-resistant mouse murine myeloma SP2/0 non-producer cell line. 
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Methods for generating hybrids of antibody-producing spleen or lymph node cells and myeloma 
cells usually comprise mixing somatic cells with myeloma cells in a 2:1 proportion, though the proportion 
may vary from about 20:1 to about 1:1, respectively, in the presence of an agent or agents (chemical or 
electrical) that promote the fusion of cell membranes. Fusion methods using Sendai virus have been 
5 described by Kohler and Milstein (1975; 1976), and those using polyethylene glycol (PEG), such as 37% 
(v/v) PEG, by Gefter et at. (1977). The use of electrically induced fusion methods is also appropriate 

(Coding pp. 71-74, 1986). g g 

Fusion procedures usually produce viable hybrids at low frequencies, about 1 X 10 6 to 1X10. 
However, this low frequency does not pose a problem, as the viable, fused hybrids are differentiated from 

,0 the parental, unfused cells (particularly the unfused myeloma cells that would normally continue to divide 
indefinitely) by culturing in a selective medium. The selective medium is generally one that contains an 
agent that blocks the de novo synthesis of nucleotides in the tissue culture media. Exemplary and 
preferred agents are aminopterin, methotrexate, and azaserine. Aminopterin and methotrexate block * 
novo synthesis of both purines and pyridines, whereas azaserine blocks only purine synthesis. Where 

15 aminopterin or methotrexate is used, the media is supplemented with hypoxanthine and thymidine as a 
source of nucleotides (HAT medium). Where azaserine is used, the media is supplemented with 
hypoxanthine. 

The preferred selection medium is HAT. Only cells capable of operating nucleotide salvage 
pathways are able to survive in HAT medium. The myeloma cells are defective in key enzymes of the 
20 salvage pathway. hypoxanthine phosphoribosyl transferase (HPRT), and thus they cannot survive. 
The B cells can operate this pathway, but they have a limited life span in culture and generally die within 
about two weeks. Therefore, the only cells that can survive in the selective media are those hybrids 

formed from myeloma and B cells. 

This culturing provides a population of hybridomas from which specific hybridomas are selected. 
25 Typically, selection of hybridomas is performed by culturing the cells by single-clone dilution in microliter 
plates, followed by testing the individual clonal supematants (after about two to three weeks) for the 
desired reactivity. The assay should be sensitive, simple and rapid, such as radioimmunoassays, enzyme 
immunoassays, cytotoxicity assays, plaque assays, dot immunobinding assays, and the like. 
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The selected hybridomas would then be serially diluted and cloned into individual 
antibody-producin 9 cell lines, which can then be propagated indefinitely to provide MAbs. The cell lines 
may be exploited for MAb production in two basic ways. A sample of the hybridoma can be injected 
(often into the peritoneal cavity) into a histocompatible animal of the type that was used to provide the 
somatic and myeloma cells for the original fusion. The injected animal develops tumors secreting the 
specific monoclonal antibody produced by the fused cell hybrid. The body fluids of the animal, such as 
serum or ascites fluid, can then be tapped to provide MAbs in high concentration. The individual cell lines 
could also be cultured * vitro, where the MAbs are naturally secreted into the culture medium from which 
they can be readily obtained in high concentrations. MAbs produced by either means may be further 
purified, if desired, using filtration, centrifugation and various chromatographic methods such as HPLC or 
affinity chromatography. 

Large amounts of the monoclonal antibodies of the present invention may also be obtained by 
multiplying hybridoma cells * mo. Cell Cones are injected into mammals that are histocompatible with 
the parent cells, e.g., syngeneic mice, to cause growth of antibody-producing tumors. Optionally, the 
animals are primed with a hydrocarbon, especially oils such as pristane (tetramethylpentadecane) prior to 
injection. 

In accordance with the present invention, fragments of the monoclonal antibody of the invention 
can be obtained from the monoclonal antibody produced as described above, by methods which include 
digestion with enzymes such as pepsin or papain and/or cleavage of disulfide bonds by chemical reduction. 
Alternatively, monoclonal antibody fragments encompassed by the present invention can be synthesized 
using an automated peptide synthesizer, or by expression of full-length gene or of gene fragments in £ 
coli. 

The monoclonal conjugates of the present invention are prepared by methods known in the art, 
e.g.. by reacting a monoclonal antibody prepared as described above with, for instance, an enzyme in the 
presence of a coupling agent such as glutaraldehyde or periodate. Conjugates with fluorescein markers 
are prepared in the presence of these coupling agents or by reaction with an isothiocyanate. Conjugates 
with metal chelates are similarly produced. Other moieties to which antibodies may be conjugated include 
radionuclides such as 3 H. ' 25 |, ,3 '| 32 P , "c, 5, Cr, 36 CI. "Co. 58 Co, 59 Fe, 75 Se. ' 52 Eu. and 99m Tc, are 
other useful labels that can be conjugated to antibodies. Radioactively labeled monoclonal antibodies of 
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the present invention are produced according to well-known methods in the art. For instance, monoclonal 
antibodies can be iodinated by contact with sodium or potassium iodide and a chemical oxidizing agent 
such as sodium hypochlorite, or an enzymatic oxidizing agent, such as lactoperoxidase. Monoclonal 
antibodies according to the invention may be labeled with technetium " by ligand exchange process, for 

5 example, by reducing pertechnate with stannous solution, chelating the reduced technetium onto a 
Sephadex column and applying the antibody to this column or by direct labelling techniques, e.g.. by 
incubating pertechnate. a reducing agent such as SNCI 2 . a buffer solution such as sodium-potassium 
phthaiate solution, and the antibody. 

It will be appreciated by those of skill in the art that monoclonal or polyclonal antibodies specific 

10 for HNFla, HNFlp or HNF4a (for proteins that are mutated in M0DY3, M0DY4, and M0DY1) will have 
utilities in several types of applications. These can include the production of diagnostic kits for use in 
detecting or diagnosing M0DY3, M0DY4, and M0DY1 type diabetes. The skilled practitioner will realize 
that such uses are within the scope of the present invention. 

J. Immunodetection Assays 

15 The immunodetection methods of the present invention have evident utdity in the d.agnos.s of 

conditions such as W10DY3. M0DY4, and M0DY1 related NIDDM. Here, a biological or clinical sample 
suspected of containing either the encoded protein or peptide or corresponding antibody is used. 
However, these embodiments also have applications to non-clinical samples, such as in the titering of 
antigen or antibody samples, in the selection of hybridomas, and the like. 
20 In the clinical diagnosis or monitoring of patients with M0DY3, M0DY4 or M0DY1 . the detection 

of an antigen encoded by an HNFla nucleic acid, HNF4a nucleic acid. HNFlp nucleic acid, or an 
decrease in the levels of such an antigen, in comparison to the levels in a corresponding biological sample 
from a normal subject is indicative of a patient with M0DY3. M0DY4. or M0DY1. The basis for such 
diagnostic methods lies, in part, with the finding that the nucleic acid HNFla. HNFlp and HNF4a 
25 mutants identified in the present invention are responsible for M0DY3, M0DY4. and M0DY1 related 
diabetes, respectively. Hence, it can be inferred that at least some of these mutations produce elevated 
levels of encoded proteins, that may also be used as markers for M0DY3. M0DY4 or M0DY1 . 

Those of skill in the art are very familiar with differentiating between significant expression of a 
biomarker. which represents a positive identification, and low level or background expression of a 



WO 98/11254 

PCT/US97/16037 

64 

biomarker. indeed, background expression levels are often used to form a "cutoff" above which 
increased staining will be scored as significant or positive. Significant expression may be represented by 
high levels of antigens in tissues or within body fluids, or alternatively, by a high proportion of cells from 
within a tissue that each give a positive signal. 
/. Immunodetection Methods 

In still further embodiments, the present invention concerns immunodetection methods for 
binding, purifying, removing, quantifying or otherwise generally detecting biological components. The 
encoded proteins or peptides of the present invention may be employed to detect antibodies having 
reactivity therewith, or, alternatively, antibodies prepared in accordance with the present invention, may 
be employed to detect the encoded proteins or peptides. The steps of various useful immunodetection 
methods have been described in the scientific literature, such as, e.g., Nakamura etel. (1987). 

In general, the immunobinding methods include obtaining a sample suspected of containing a 
protein, peptide or antibody, and contacting the sample with an antibody or protein or peptide in 
accordance with the present invention, as the case may be, under conditions effective to allow the 
formation of immunocomplexes. 

The immunobinding methods include methods for detecting or quantifying the amount of a 
reactive component in a sample, which methods require the detection or quantitation of any immune 
complexes formed during the binding process. Here, one would obtain a sample suspected of containing a 
HNFIa or HNF4a mutant encoded protein, peptide or a corresponding antibody, and contact the sample 
with an antibody or encoded protein or peptide, as the case may be, and then detect or quantify the 
amount of immune complexes formed under the specific conditions. 

In terms of antigen detection, the biological sample analyzed may be any sample that is suspected 
of containing a HNFIa, HNFIp or HNF4a antigen, such as a pancreatic p-cell. a homogenized tissue 
extract, an isolated cell, a cell membrane preparation, separated or purified forms of any of the above 
protein-containing compositions, or even any biological fluid that comes into contact with diabetic tissue, 
including blood. 

Contacting the chosen biological sample with the protein, peptide or antibody under conditions 
effective and for a period of time sufficient to allow the formation of immune complexes (primary immune 
complexes) is generally a matter of simply adding the composition to the sample and incubating the 
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mixture for a period of time long enough for the antibodies to form immune complexes with, i.e., to bind 
to, any antigens present. After this time, the sample-antibody composition, such as a tissue section, 
ELISA plate, dot blot or western blot, will generally be washed to remove any non-specifically bound 
antibody species, allowing only those antibodies specifically bound within the primary immune complexes 
to be detected. 

In general, the detection of immunocomptex formation is well known in the art and may be 
achieved through the application of numerous approaches. These methods are generally based upon the 
detection of a label or marker, such as any radioactive, fluorescent, biological or enzymatic tags or labels 
of standard use in the art. U.S. Patents concerning the use of such labels include 3,817,837; 3,850,752; 
3,939,350; 3,996,345; 4,277,437; 4,275,149 and 4,366,241, each incorporated herein by reference. 
Of course, one may find additional advantages through the use of a secondary binding ligand such as a 
second antibody or a biotinjavidin ligand binding arrangement, as is known in the art. 

The encoded protein, peptide or corresponding antibody employed in the detection may itself be 
linked to a detectable label, wherein one would then simply detect this label, thereby allowing the amount 
of the primary immune complexes in the composition to be determined. 

Alternatively, the first added component that becomes bound within the primary immune 
complexes may be detected by means of a second binding ligand that has binding affinity for the encoded 
protein, peptide or corresponding antibody. In these cases, the second binding ligand may be linked to a 
detectable label. The second binding ligand is itself often an antibody, which may thus be termed a 
"secondary" antibody. The primary immune complexes are contacted with the labeled, secondary binding 
ligand, or antibody, under conditions effective and for a period of time sufficient to allow the formation of 
secondary immune complexes. The secondary immune complexes are then generally washed to remove 
any non-specifically bound labeled secondary antibodies or ligands, and the remaining label in the 
secondary immune complexes is then detected. 

Further methods include the detection of primary immune complexes by a two step approach. A 
second binding ligand, such as an antibody, that has binding affinity for the encoded protein, peptide or 
corresponding antibody is used to form secondary immune complexes, as described above. After 
washing, the secondary immune complexes are contacted with a third binding ligand or antibody that has 
binding affinity for the second antibody, again under conditions effective and for a period of time 
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sufficient to allow the formation of immune complexes (tertiary immune complexes). The third ligand or 
antibody is finked to a detectable label, allowing detection of the tertiary immune complexes thus formed. 
This system may provide for signal amplification if desired. 
2. Immunohistochemistry 

The antibodies of the present invention may also be used in conjunction with both fresh-frozen 
and formalin-fixed, paraffin-embedded tissue blocks prepared for study by immunohistochemistry (IHC). 
For example, each tissue block consists of 50 mg of residual "pulverized" diabetic tissue. The method of 
preparing tissue blocks from these particulate specimens has been successfully used in previous IHC 
studies of various prognostic factors, and is well known to those of skill in the art (Brown et a/.. 1990; 
Abbondanzoefa/., 1990; Allred etai, 1990). 

Briefly, frozen-sections may be prepared by rehydrating 50 ng of frozen "pulverized" diabetic 
tissue at room temperature in phosphate buffered saline (PBS) in small plastic capsules; pelleting the 
particles by centrifugation; resuspending them in a viscous embedding medium (OCT); inverting the 
capsule and pelleting again by centrifugation; snap-freezing in 70°C isopentane; cutting the plastic 
capsule and removing the frozen cylinder of tissue; securing the tissue cylinder on a cryostat microtome 
chuck; and cutting 25-50 serial sections. 

Permanent sections may be prepared by a similar method involving rehydration of the 50 mg 
sample in a plastic microfuge tube; pelleting; resuspending in 10% formalin for 4 hours fixation; 
washing/pelleting; resuspending in warm 2.5% agar; pelleting; cooling in ice water to harden the agar,' 
removing the tissue/agar block from the tube; infiltrating and embedding the block in paraffin; and cutting 
up to 50 serial permanent sections. 
3. ELISA 

As noted, ft is contemplated that the encoded proteins or peptides of the invention will find utility 
as immunogens, e.g., in connection with vaccine development, in immunohistochemistry and in ELISA 
assays. One evident utility of the encoded antigens and corresponding antibodies is in immunoassays for 
the detection of HNF1a. HNFIfJ and HNF4a, mutant protiens. as needed in diagnosis and prognostic 
monitoring of MODY. 

Immunoassays, in their most simple and direct sense, are binding assays. Certain preferred 
immunoassays are the various types of enzyme linked immunosorbent assays (ELISA) and 
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radioimmunoassays (RIA) known in the art. Immunohistochemical detection using tissue sections is also 
particularly useful. However, it will be readily appreciated that detection is not limited to such 
techniques, and western blotting, dot Wotting, FACS analyses, and the like may also be used. 

In one exemplary ELISA, antibodies binding to the encoded proteins of the invention are 
immobilized onto a selected surface exhibiting protein affinity, such as a well in a polystyrene microliter 
plate. Then, a test composition suspected of containing the HNFIa, HNFip or HNF4a mutant, such as a 
clinical sample, is added to the wells. After binding and washing to remove non-specifically bound 
immune complexes, the bound antibody may be detected. Detection is generally achieved by the addition 
of a second antibody specific for the target protein, that is linked to a detectable label. This type of 
ELISA is a simple "sandwich ELISA". Detection may also be achieved by the addition of a second 
antibody, followed by the addition of a third antibody that has binding affinity for the second antibody, 
with the third antibody being linked to a detectable label. 

In another exemplary ELISA, the samples suspected of containing the mutant HNFIa, HNF1p or 
HNF4a antigen are immobilized onto the well surface and then contacted with the antibodies of the 
invention. After binding and washing to remove non-specifically bound immune complexes, the bound 
antigen is detected. Where the initial antibodies are linked to a detectable label, the immune complexes 
may be detected directly. Again, the immune complexes may be detected using a second antibody that 
has binding affinity for the first antibody, with the second antibody being linked to a detectable label. 

Another EUSA in which the proteins or peptides are immobilized, involves the use of antibody 
competition in the detection. In this ELISA, labeled antibodies are added to the wells, allowed to bind to 
the mutant HNFIa protein, mutant HNF1p protein or mutant HNF4a protein, and detected by means of 
their label. The amount of marker antigen in an unknown sample is then determined by mixing the sample 
with the labeled antibodies before or during incubation whh coated wells. The presence of marker 
antigen in the sample acts to reduce the amount of antibody available for binding to the well and thus 
reduces the ultimate signal. This is appropriate for detecting antibodies in an unknown sample, where the 
unlabeled antibodies bind to the antigen-coated wells and also reduces the amount of antigen available to 
bind the labeled antibodies. 
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Irrespective of the format employed, ELISAs have certain features in common, such as coating, 
incubating or binding, washing to remove non-specifically bound species, and detecting the bound immune 
complexes. These are described as follows: 

In coating a plate with either antigen or antibody, one will generally incubate the wells of the 
plate with a solution of the antigen or antibody, either overnight or for a specified period of hours. The 
wells of the plate will then be washed to remove incompletely adsorbed material. Any remaining available 
surfaces of the wells are then "coated" with a nonspecific protein that is antigenically neutral with 
regard to the test antisera. These include bovine serum albumin (USA), casein and solutions of milk 
powder. The coating of nonspecific adsorption sites on the immobilizing surface reduces the background 
caused by nonspecific binding of antisera to the surface. 

In ELISAs, it is probably more customary to use a secondary or tertiary detection means rather 
than a direct procedure. Thus, after binding of a protein or antibody to the well, coating with a non- 
reactive material to reduce background, and washing to remove unbound material, the immobilizing 
surface is contacted with the control M0DY3, M0DY4 or M0DY1 and/or clinical or biological sample to 
be tested under conditions effective to allow immune complex (antigen/antibody) formation. Detection of 
the immune complex then requires a labeled secondary binding Iigand or antibody, or a secondary binding 
ligand or antibody in conjunction with a labeled tertiary antibody or third binding Iigand. 

"Under conditions effective to allow immune complex (antigen/antibody) formation" means that 
the conditions preferably include diluting the antigens and antibodies with solutions such as BSA, bovine 
gamma globulin (BGG) and phosphate buffered saline (PBS)/Tween™. These added agents also tend to 
assist in the reduction of nonspecific background. 

The "suitable" conditions also mean that the incubation is at a temperature and for a period of 
time sufficient to allow effective binding. Incubation steps are typically from about 1 to 2 to 4 hours, at 
temperatures preferably on the order of 25° to 27"C, or may be overnight at about 4°C or so. 

Following all incubation steps in an ELISA, the contacted surface is washed so as to remove non- 
compiexed material. A preferred washing procedure includes washing with a solution such as 
PBS/Tween-, or borate buffer. Following the formation of specific immune complexes between the test 
sample and the originally bound material, and subsequent washing, the occurrence of even minute 
amounts of immune complexes may be determined. 
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To provide a detecting means, the second or third antibody will have an associated label to allow 
detection. Preferably, this label will be an enzyme that will generate color development upon incubating 
with an appropriate chromogenic substrate. Thus, for example, one will desire to contact and incubate 
the first or second immune complex with a urease, glucose oxidase, alkaline phosphatase or hydrogen 
peroxidase-coniugated antibody for a period of time and under conditions that favor the development of 
further immune complex formation incubation for 2 hours at room temperature in a PBS-containing 

solution such as PBS-Tween™). 

After incubation with the labeled antibody, and subsequent to washing to remove unbound 
material, the amount of label is quantified, e.g.. by incubation with a chromogenic substrate such as urea 
and bromocresol purple or 2.2'-azido di-|3-ethyl-benzthiazoline-6 sulfonic acid [ABTS1 and H 2 0 2 , in the 
case of peroxidase as the enzyme label. Quantitation is then achieved by measuring the degree of color 
generation, e.g.. using a visible spectra spectrophotometer. 

4. Use of Antibodies for Radio imaging 

The antibodies of this invention will be used to quantify and localize the expression of the 
encoded marker proteins. The antibody, for example, will be labeled by any one of a variety of methods 
and used to visualize the localized concentration of the cells producing the encoded protein. Such an 
assay also will reveal the subcellular localization of the protein, which can have diagnostic and 
therapeutic applications. 

In accordance with this invention, the monoclonal antibody or fragment thereof may be labeled by 
any of several techniques known to the art. The methods of the present invention may also use 
paramagnetic isotopes for purposes of in vivo detection. Elements particularly useful in Magnetic 
Resonance imaging fMRD include 157 Gd. B5 Mn, 162 Dy, 52 Cr, and 56 Fe. 

Administration of the labeled antibody may be local or systemic and accomplished intravenously, 
intraarterially, via the spinal fluid or the like. Administration may also be intradermal or intracavitary, 
depending upon the body site under examination. After a sufficient time has lapsed for the monoclonal 
antibody or fragment thereof to bind with the diseased tissue, for example. 30 minutes to 48 hours, the 
area of the subject under investigation is examined by routine imaging techniques such as MRI, SPECT, 
planar scintillation imaging or newly emerging imaging techniques. The exact protocol will necessarily 
vary depending upon factors specific to the patient, as noted above, and depending upon the body site 
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under examination, method of administration and type of label used; the determination of specific 
procedures would be routine to the skilled artisan. The distribution of the bound radioactive isotope and 
its increase or decrease with time is then monitored and recorded. By comparing the results with data 
obtained from studies of clinically normal individuals, the presence and extent of the diseased tissue can 
be determined. 

It will be apparent to those of skill in the art that a similar approach may be used to radio-image 
the production of the encoded HNF1a, HNFIp or HNF4a mutant proteins in human patients. The present 
invention provides methods for the in vivo diagnosis of M0DY3, M0DY4 or M0DY1 in a patient. Such 
methods generally comprise administering to a patient an effective amount of an HNFIcc, HNFip or 
HNF4a mutant specific antibody, to which antibody is conjugated a marker, such as a radioactive isotope 
or a spin-labeled molecule, that is detectable by non-invasive methods. The antibody-marker conjugate is 
allowed sufficient time to come into contact with reactive antigens that are present within the tissues of 
the patient, and the patient is then exposed to a detection device to identify the detectable marker. 
5. Kits 

In still further embodiments, the present invention concerns immunodetection kits for use with 
the immunodetection methods described above. As the encoded proteins or peptides may be employed to 
detect antibodies and the corresponding antibodies may be employed to detect encoded proteins or 
peptides, either or both of such components may be provided in the kit. The immunodetection kits will 
thus comprise, in suitable container means, an encoded protein or peptide, or a first antibody that binds to 
an encoded protein or peptide, and an immunodetection reagent. 

In certain embodiments, the encoded protein or peptide, or the first antibody that binds to the 
encoded protein or peptide, may be bound to a solid support, such as a column matrix or well of a 
microtiter plate. 

The immunodetection reagents of the kit may take any one of a variety of forms, including those 
detectable labels that are associated with or linked to the given antibody or antigen, and detectable labels 
that are associated with or attached to a secondary binding ligand. Exemplary secondary ligands are 
those secondary antibodies that have binding affinity for the first antibody or antigen, and secondary 
antibodies that have binding affinity for a human antibody. 
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Further suitable immunodetection reagents for use in the present kits include the two-component 
reagent that comprises a secondary antibody that has binding affinity for the first antibody or antigen, 
along with a third antibody that has binding affinity for the second antibody, the third antibody being 
linked to a detectable label. 

5 The kits may further comprise a suitably aliquoted composition of the encoded protein or 

polypeptide antigen, whether labeled or unlabeled, as may be used to prepare a standard curve for a 
detection assay. 

The kits may contain antibody-label conjugates either in fully conjugated form, in the form of 
intermediates, or as separate moieties to be conjugated by the user of the kit. The components of the 

10 kits may be packaged either in aqueous media or in lyophilized form. 

The container means of the kits will generally include at least one vial, test tube, flask, bottle, 
syringe or other container means, into which the antibody or antigen may be placed, and preferably, 
suitably aliquoted. Where a second or third binding ligand or additional component is provided, the kit will 
also generally contain a second, third or other additional container into which this ligand or component 

15 may be placed. The kits of the present invention will also typically include a means for containing the 
antibody, antigen, and any other reagent containers in close confinement for commercial sale. Such 
containers may include injection or blow-molded plastic containers into which the desired vials are 
retained. 

K. Detection and Quantitation of Nucleic Acid Species 

20 One embodiment of the instant invention comprises a method for identification of HNF1 a, HNF 1 p 

or HNF4a mutants in a biological sample by amplifying and detecting nucleic acids corresponding to 
HNFla, HNF1p or HNF4a mutants. The biological sample can be any tissue or fluid in which these 
mutants might be present. Various embodiments include p and a-cells of pancreatic islets, bone marrow 
aspirate, bone marrow biopsy, lymph node aspirate, lymph node biopsy, spleen tissue, fine needle 
25 aspirate, skin biopsy or organ tissue biopsy. Other embodiments include samples where the body fluid is 
peripheral blood, lymph fluid, ascites, serous fluid, pleural effusion, sputum, cerebrospinal fluid, lacrimal 
fluid, stool or urine. 

Nucleic acid used as a template for amplification is isolated from cells contained in the biological 
sample, according to standard methodologies (Sambrook eta/., 1989). The nucleic acid may be genomic 
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DNA or fractionated or whole cell RNA. Where RNA is used, it may be desired to convert the RNA to a 
complementary DNA. In one embodiment, the RNA is whole cell RNA and is used directly as the template 
for amplification. 

Pairs of primers that selectively hybridize to nucleic acids corresponding to HNFIct, HNFIp or 
HNF4« mutants are contacted with the isolated nucleic acid under conditions that permit selective 
hybridization. Once hybridized, the nucleic acidrprimer complex is contacted with one or more enzymes 
that facilitate template-dependent nucleic acid synthesis. Multiple rounds of amplification, also referred 
to as "cycles," are conducted until a sufficient amount of amplification product is produced. 

Next, the amplification product is detected. In certain applications, the detection may be 
performed by visual means. Alternatively, the detection may involve indirect identification of the product 
via chemiluminescence, radioactive scintigraphy of incorporated radiolabel or fluorescent label or even via 
a system using electrical or thermal impulse signals (Affymax technology; Bellus, 1994). 

Following detection, one may compare the results seen in a given patient with a statistically 
significant reference group of normal patients and MOOY or indeed MODY dependent diabetics and non 
MODY dependent diabetics. In this way, it is possible to correlate the amount of HNFIct, HNFip or 
HNF4a mutants detected with various clinical states. 
f. Primers 

The term primer, as defined herein, is meant to encompass any nucleic acid that is capable of 
priming the synthesis of a nascent nucleic acid in a template-dependent process. Typically, primers are 
oligonucleotides from ten to twenty base pairs in length, but longer sequences can be employed. Primers 
may be provided in double-stranded or single-stranded form, although the single-stranded form is 
preferred. 

2. T emplate Dependent Amplification Methods 

A number of template dependent processes are available to amplify the marker sequences present 
•n a given template sample. One of the best known amplification methods is the polymerase chain 
reaction (referred to as PCR) which is described in detail in U.S. Patent Nos. 4,683,195, 4,683,202 and 
4,800,159. and in Innis eta!., 1990. each of which is incorporated herein by reference in its entirety. 

Briefly, in PCR. two primer sequences are prepared that are complementary to regions on 
opposite complementary strands of the marker sequence. An excess of deoxynucleoside triphosphates 
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are added to a reaction mixture along with a DNA polymerase, e.g., Taq polymerase. If the marker 
sequence is present in a sample, the primers will bind to the marker and the polymerase will cause the 
primers to be extended along the marker sequence by adding on nucleotides. By raising and lowering the 
temperature of the reaction mixture, the extended primers will dissociate from the marker to form 
reaction products, excess primers will bind to the marker and to the reaction products and the process is 



A reverse transcriptase PCR amplification procedure may be performed in order to quantify the 
amount of mRNA amplified. Methods of reverse transcribing RNA into cDNA are well known and 
described in Sambrook et a/.. 1989. Alternative methods for reverse transcription utilize thermostable, 
RNAdependent DNA polymerases. These methods are described in WO 90107641 filed December 21. 
1990. Polymerase chain reaction methodologies are well known in the art. 

Another method for amplification is the ligase chain reaction ("LCR"), disclosed in EPA No. 320 
308, incorporated herein by reference in its entirety. In LCR. two complementary probe pairs are 
prepared, and in the presence of the target sequence, each pair will bind to opposite complementary 
strands of the target such that they abut. In the presence of a ligase, the two probe pairs will link to 
form a single unit. By temperature cycling, as in PCR, bound ligated units dissociate from the target and 
then serve as "target sequences" for ligation of excess probe pairs. U.S. Patent 4,883,750 describes a 
method similar to LCR for binding probe pairs to a target sequence. 

Qbeta Replicase, described in PCT Application No. PCT/US87J00880, may also be used as still 
another amplification method in the present invention. In this method, a replicative sequence of RNA that 
has a region complementary to that of a target is added to a sample in the presence of an RNA 
polymerase. The polymerase will copy the replicative sequence that can then be detected. 

An isothermal amplification method, in which restriction endonucleases and ligases are used to 
achieve the amplification of target molecules that contain nucleotide 5* [alpha thiol-triphosphates in one 
strand of a restriction site may also be useful in the amplification of nucleic acids in the present invention, 
Walker eta/., (1992), incorporated herein by reference in its entirety. 

Strand Displacement Amplification ISDA) is another method of carrying out isothermal 
amplification of nucleic acids which involves multiple rounds of strand displacement and synthesis, i.e., 
nick translation. A similar method, called Repair Chain Reaction IRCR). involves annealing several probes 
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throughout a region targeted for amplification, followed by a repair reaction in which only two of the four 
bases are present. The other two bases can be added as biotinylated derivatives for easy detection. A 
similar approach is used in SDA. Target specific sequences can also be detected using a cyclic probe 
reaction ICPR). In CPR, a probe having 3' and 5' sequences of non-specific DNA and a middle sequence of 
specific RNA is hybridized to DNA that is present in a sample. Upon hybridization, the reaction is treated 
with RNase H, and the products of the probe identified as distinctive products that are released after 
digestion. The original template is annealed to another cycling probe and the reaction is repeated. 

Still another amplification methods described in GB Application No. 2 202 328, and in PCT 
Application No. PCT/US89/01025, each of which is incorporated herein by reference in its entirety, may 
be used in accordance with the present invention. In the former application, -modified" primers are used 
in a PCR-like, template- and enzyme-dependent synthesis. The primers may be modified by labelling with 
a capture moiety (e.g., biotin) and/or a detector moiety [e.g., enzyme). In the latter application, an excess 
of labeled probes are added to a sample. In the presence of the target sequence, the probe binds and is 
cleaved catalytically. After cleavage, the target sequence is released intact to be bound by excess probe. 
Cleavage of the labeled probe signals the presence of the target sequence. 

Other nucleic acid amplification procedures include transcription-based amplification systems 
(TAS), including nucleic acid sequence based amplification (NASBA) and 3SR (Kwoh et ai. 1989); 
Bingeras et ai, PCT Application WO 88/10315. incorporated herein by reference in their entirety). In 
NASBA, the nucleic acids can be prepared for amplification by standard phenol/chloroform extraction, 
heat denaturation of a clinical sample, treatment with lysis buffer and minispin columns for isolation of 
DNA and RNA or guanidinium chloride extraction of RNA. These amplification techniques involve 
annealing a primer which has target specific sequences. Following polymerization, DNA/RNA hybrids are 
digested with RNase H while double stranded DNA molecules are heat denatured again. In either case the 
single stranded DNA is made fully double stranded by addition of second target specific primer, followed 
by polymerization. The double-stranded DNA molecules are then multiply transcribed by an RNA 
polymerase such as T7 or SP6. in an isothermal cyclic reaction, the RNA's are reverse transcribed into 
single stranded DNA, which is then converted to double stranded DNA. and then transcribed once again 
with an RNA polymerase such as T7 or SP6. The resulting products, whether truncated or complete, 
indicate target specific sequences. 



WO 98/11254 PCT/US97/16037 

75 

Davey eta/., EPA No. 329 822 (incorporated herein by reference in its entirety) disclose a nucleic 
acid amplification process involving cyclically synthesizing single-stranded RNA ("ssRNA"), ssDNA, and 
double-stranded DNA (dsDNA), which may be used in accordance with the present invention. The ssRNA 
is a template for a first primer oligonucleotide, which is elongated by reverse transcriptase (RNA- 
dependent DNA polymerase). The RNA is then removed from the resulting DNA:RNA duplex by the action 
of ribonuclease H (RNase H, an RNase specific for RNA in duplex with either DNA or RNA). The resultant 
ssDNA is a template for a second primer, which also includes the sequences of an RNA polymerase 
promoter (exemplified by T7 RNA polymerase) 5' to its homology to the template. This primer is then 
extended by DNA polymerase (exemplified by the large "Klenow" fragment of £ coli DNA polymerase I), 
resulting in a double-stranded DNA ("dsDNA") molecule, having a sequence identical to that of the original 
RNA between the primers and having additionally, at one end, a promoter sequence. This promoter 
sequence can be used by the appropriate RNA polymerase to make many RNA copies of the DNA. These 
copies can then re enter the cycle leading to very swift amplification. With proper choice of enzymes, 
this amplification can be done isothermally without addition of enzymes at each cycle. Because of the 
cyclical nature of this process, the starting sequence can be chosen to be in the form of either DNA or 
RNA. 

Miller et a/., PCT Application WO 89/06700 (incorporated herein by reference in its entirety) 
disclose a nucleic acid sequence amplification scheme based on the hybridization of a promoter/primer 
sequence to a target single-stranded DNA ("ssDNA") followed by transcription of many RNA copies of the 
sequence. This scheme is not cyclic, U. new templates are not produced from the resultant RNA 
transcripts. Other amplification methods include "RACE" and "one-sided PCR" (Frohman, M.A., In: PCR 
PROTOCOLS: A GUIDE TO METHODS AND APPLICATIONS, Academic Press, N.Y., 1990; Ohara et a/., 
1989; each herein incorporated by reference in their entirety). 

Methods based on ligation of two (or more) oligonucleotides in the presence of nucleic acid having 
the sequence of the resulting "di-oligonucleotide", thereby amplifying the di-oligonucleotide, may also be 
used in the amplification step of the present invention. Wu eta/., 1989), incorporated herein by reference 
in its entirety. 
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J. RNase Protection Assay 

Methods for genetic screening by identifying mutations associated with most genetic diseases 
such as diabetes must be able to assess large regions of the genome. Once a relevant mutation has been 
identified in a given patient, other family members and affected individuals can be screened using 
methods which are targeted to that site. The ability to detect dispersed point mutations is critical for 
genetic counseling, diagnosis, and early clinical intervention as well as for research into the etiology of 
cancer and other genetic disorders. The ideal method for genetic screening would quickly, inexpensively, 
and accurately detect all types of widely dispersed mutations in genomic DNA, cDNA, and RNA samples, 
depending on the specific situation. 

Historically, a number of different methods have been used to detect point mutations, including 
denaturing gradient gel electrophoresis ("DGGE"), restriction enzyme polymorphism analysis, chemical and 
enzymatic cleavage methods, and others (Cotton, 1989). The more common procedures currently in use 
include direct sequencing of target regions amplified by PCR™ and single-strand conformation 
polymorphism analysis ("SSCP"). 

Another method of screening for point mutations is based on RNase cleavage of base pair 
mismatches in RNA/DNA and RNA/RNA heteroduplexes. As used herein, the term "mismatch" is defined 
as a region of one or more unpaired or mispaired nucleotides in a double-stranded RNA/RNA, RNA/DNA or 
DNA/DNA molecule. This definition thus includes mismatches due to insertion/deletion mutations, as well 
as single and multiple base point mutations. U.S. Patent No. 4,946,773 describes an RNase A mismatch 
cleavage assay that involves annealing single-stranded DNA or RNA test samples to an RNA probe, and 
subsequent treatment of the nucleic acid duplexes with RNase A. After the RNase cleavage reaction, the 
RNase is inactivated by proteolytic digestion and organic extraction, and the cleavage products are 
denatured by heating and analyzed by electrophoresis on denaturing polyacrylamide gels. For the 
detection of mismatches, the single-stranded products of the RNase A treatment, electrophoretically 
separated according to size, are compared to similarly treated control duplexes. Samples containing 
smaller fragments (cleavage products) not seen in the control duplex are scored as + . 

Currently available RNase mismatch cleavage assays, including those performed according to 
U.S. Patent No. 4,946,773, require the use of radiolabeled RNA probes. Myers and Maniatis in U.S. 
Patent No. 4,946,773 describe the detection of base pair mismatches using RNaseA Other 
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invenstigators have described the use of B.coli enzyme, RNase I. in mismatch assays. Because it has 
broader cleavage specificity than RNase A, RNase I would be a desirable enzyme to employ in the 
detection of base pair mismatches if components can be found to decrease the extent of non-specific 
cleavage and increase the frequency of cleavage of mismatches. The use of RNase I for mismatch 
detection is described in literature from Promega Biotech. Promega markets a kit containing RNase I that 
is shown in their literature to cleave three out of four known mismatches, provided the enzyme level is 
sufficiently high. 

The RNase protection assay as first described by Melton et al. (1984) was used to detect and 
map the ends of specific mRNA targets in solution. The assay relies on being able to easily generate high 
specific activity radiolabeled RNA probes complementary to the mRNA of interest by in vitro 
transcription. Originally, the templates for in vitro transcription were recombinant plasmids containing 
bacteriophage promoters. The probes are mixed with total cellular RNA samples to permit hybridization 
to their complementary targets, then the mixture is treated with RNase to degrade excess unhybridized 
probe. Also, as originally intended, the RNase used is specific for single stranded RNA, so that hybridized 
double-stranded probe is protected from degradation. After inactivation and removal of the RNase, the 
protected probe {which is proportional in amount to the amount of target mRNA that was present) is 
recovered and analyzed on a polyacrylamide gel. 

The RNase Protection assay was adapted for detection of single base mutations by Myers and 
Maniatis (1985) and by Winter and Perucho (1985). In this type of RNase A mismatch cleavage assay, 
radiolabeled RNA probes transcribed in vitro from wild type sequences, are hybridized to complementary 
target regions derived from test samples. The test target generally comprises DNA (either genomic DNA 
or DNA amplified by cloning in plasmids or by PCR™), although RNA targets (endogenous mRNA) have 
occasionally been used (Gibbs and Caskey, 1987; Winter et al.. 1985). If single nucleotide (or greater) 
sequence differences occur between the hybridized probe and target, the resulting disruption in Watson- 
Crick hydrogen bonding at that position (-mismatch") can be recognized and cleaved in some cases by 
single-strand specific ribonuclease. To date. RNase A has been used almost exclusively for cleavage of 
single-base mismatches, although RNase I has recently been shown as useful also for mismatch cleavage. 
There are recent descriptions of using the MutS protein and other DNA-repair enzymes for detection of 
single-base mismatches (Ellis eta/.. 1994; Lishanski eta/.. 1994). 
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By hybridizing each strand of the wild type probe in RNase cleavage mismatch assays separately 
to the complementary Sense and Antisense strands of the test target, two different complementary 
mismatches (for example. A-C and G-U or G-T) and therefore two chances for detecting each mutation by 
separate cleavage events, was provided. Myers et at. (1985) used the RNase A cleavage assay to screen 
615 bp regions of the human p.globin gene contained in recombinant plasmid targets. By probing with 
both strands, they were able to detect most, but not all, of the p-globin mutations in their model system. 
The collection of mutants included examples of all the 1 2 possible types of mismatches between RNA and 
DNA: rA/dA, rC/dC. rU/dC, rC/dA, rC/dT, rU/dG, rG/dA, rG/dG, rU/dG, rA/dC, rG/dT, and rA/dG. 

Myers et. at. (1985) showed that certain types of mismatch were more frequently and more 
completely cleaved by RNase A than others. For example, the rC/dA. rC/dC, and rC/dT mismatches were 
cleaved in all cases, while the rG/dA mismatch was only cleaved in 13% of the cases tested and the 
rG/dT mismatch was almost completely resistant to cleavage. In general, the complement of a difficult- 
to-detect mismatch was much easier to detect. For example, the refractory rG/dT mismatch generated by 
probing a G to A mutant target with a wild type sense-strand probe, is complemented by the easily 
cleaved rC/dA mismatch generated by probing the mutant target with the wild type antisense strand. By 
probing both target strands, Myers and Maniatis (1986) estimated that at least 50% of all single-base 
mutations would be detected by the RNase A cleavage assay. These authors stated that approximately 
one-third of all possible types of single-base substitutions would be detected by using a single probe for 
just one strand of the target DNA (Myers eta/., 1985). 

In the typical RNase cleavage assays, the separating gels are run under denaturing conditions for 
analysis of the cleavage products. This requires the RNase to be inactivated by treating the reaction 
with protease (usually Proteinase K, often in the presence of SDS) to degrade the RNase. This reaction is 
generally followed by an organic extraction with a phenol/chloroform solution to remove proteins and 
residual RNase activity. The organic extraction is then followed by concentration and recovery of the 
cleavage products by alcohol precipitation (Myers et a/., 1985; Winter et a/., 1985; Theophilus et at., 
1989). 

4. Separation Methods 

Following amplification, it may be desirable to separate the amplification product from the 
template and the excess primer for the purpose of determining whether specific amplification has 
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occurred. In one embodiment, amplification products are separated by agarose, agarose-acrylamide or 
polyacrylamide gel electrophoresis using standard methods. See Sambrook etaL 1989. 

Alternatively, chromatographic techniques may be employed to effect separation. There are 
many kinds of chromatography which may be used in the present invention: adsorption, partition, ion- 
exchange and molecular sieve, and many specialized techniques for using them including column, paper, 
thin-layer and gas chromatography (Freif elder, 1982). 

S. Identification Methods 

Amplification products must be visualized in order to confirm amplification of the marker 
sequences. One typical visualization method involves staining of a gel with ethidium bromide and 
visualization under UV light. Alternatively, if the amplification products are integrally labeled with radio- 
or fluorometrically-labeled nucleotides, the amplification products can then be exposed to x-ray film or 
visualized under the appropriate stimulating spectra, following separation. 

In one embodiment, visualization is achieved indirectly. Following separation of amplification 
products, a labeled, nucleic acid probe is brought into contact with the amplified marker sequence. The 
probe preferably is conjugated to a chromophore but may be radiolabeled. In another embodiment, the 
probe is conjugated to a binding partner, such as an antibody or biotin, and the other member of the 

binding pair carries a detectable moiety. 

In one embodiment, detection is by Southern blotting and hybridization with a labeled probe. The 
techniques involved in Southern blotting are well known to those of skill in the art and can be found in 
many standard books on molecular protocols. See Sambrook et aL 1989. Briefly, amplification products 
are separated by gel electrophoresis. The gel is then contacted with a membrane, such as nitrocellulose, 
permitting transfer of the nucleic acid and non-covalent binding. Subsequently, the membrane is 
incubated with a chromophore-conjugated probe that is capable of hybridizing with a target amplification 
product. Detection is by exposure of the membrane to x-ray film or ion-emitting detection devices. 

One example of the foregoing is described in U.S. Patent No. 5,279,721. incorporated by 
reference herein, which discloses an apparatus and method for the automated electrophoresis and 
transfer of nucleic acids. The apparatus permits electrophoresis and blotting without external 
manipulation of the gel and is ideally suited to carrying out methods according to the present invention. 
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6. Kit Components 

All the essential notarial* and reagent, required f w aetectin, MODY marker* in a biabjical 
sample ma, ba assailed toeethe, in a kit. This pem^y wi» comprisa pre-aelaced primers fa, specific 
makers. Aba included may be e^ymes ^ ,„ ^ ^ ^ 

pelymerases (RT. Tap. etcl. daaxynaclaatipes aml „„ ffers 10 ^ ^ ^ % 

amplification. 

Such kits generally will comprise, in suitable means, distinct containers for each individual 
reagent and enzyme as well as for each marker primer pair. Preferred pairs of primers for amplifying 
nucle,c acids are selected to amplify the sequences specified in SEQ ID N0:3, SEQ ID N05 or SED ID 
N0:5, along with the cDNAs for HNFIcc (SEQ ID N0:1) HNF10 (SEQ ID N0:128) and HNF4a (SEQ ID 
N0:78). In other embodiments preferred pairs of primers for amplification are selected to amplify 
sequences specified in SEQ ID N0:34, SEQ ID N0:36, SEQ ID N0:38, SEQ ID N0:40, SEQ ID N042 SEQ ID 
N0:44, SEQ ID N0:46, SEQ ID N0:48, SEQ ID N0:50, SEQ ID N0:52, SEQ ID N0:54. 

In another embodiment, such kits will comprise hybridization probes specific for M0DY3, chosen 
from a group including nucleic acids corresponding to the sequences specified in SEQ ID NO 1 SEQ ID 
N0:3, SEQ ID N0:5, and SEQ ID N0:7, along with the cDNAs for HNF1a (SEQ ID N0:1,. In yet another 
embodiment such kits will comprise probes specific for MODY 1 chosen from a group including nucleic 
acids corresponding to the sequences specified in SEQ ID N0:78, SEQ ID N0:34, SEQ ID N036 SEQ ID 
N0:38, SEQ ID N0:40, SEQ ID N0:42, SEQ ID N0:44. SEQ ID N0:46, SEQ ID N0:48, SEQ ID N050 SEQ ID 
N0:52. SEQ ,D N0:54. HNF4cc In still another embodiment such kits will comprise probes specific for 
M0DY4 chosen from a group including nucleic acids corresponding to the sequences specified in SEQ ID 
N0:128, HNF1 0 or any of the exons shown in FIG. 27A-FIG. 271, or Genbank accession numbers U90279- 
90287 and U96079, incorporated herein by reference. 

Such kits generally will comprise, in suitable means, distinct containers for each individual 
reagent and enzyme as well as for each marker hybridization probe. 
L Use of RNA Fingerprinting to Identify M0DY3, M0DY4, and M0DY1 Markers 

RNA fingerprinting is a means by which RNAs isolated from many different tissues, cell types or 
treatment groups can be sampled simultaneously to identify RNAs whose relative abundances vary. Two 
forms of this technology were developed simultaneously and reported in 1992 as RNA fingerprinting by 
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differential display {Liang and Pardee, 1992; Welsh eta/., 1992). {See a/so Liang and Pardee. U.S. patent 
5,262,311, incorporated herein by reference in its entirety.) Some of the experiments described herein 
were performed similarly to Donahue eta/., J. Biol. Chem. 269: 8604-8609, 1994. 

All forms of RNA fingerprinting by PCR are theoretically similar but differ in their primer design 
and application. The most striking difference between differential display and other methods of RNA 
fingerprinting is that differential display utilizes anchoring primers that hybridize to the poly A tails of 
mRNAs. As a consequence, the PCR products amplified in differential display are biased towards the 3" 

untranslated regions of mRNAs. 

The basic technique of differential display has been described in detail (Liang and Pardee. 1992). 
Total cell RNA is primed for first strand reverse transcription with an anchoring primer composed of oligo 
dT and any two of the four deoxynucleosides. The oligo dT primer is extended using a reverse 
transcriptase, for example, Moloney Murine Leukemia Virus (MMLV) reverse transcriptase. The synthesis 
of the second strand is primed with an arbitrarily chosen oligonucleotide, using reduced stringency 
conditions. Once the double-stranded cDNA has been synthesized, amplification proceeds by standard 
PCR techniques, utilizing the same primers. The resulting DNA fingerprint is analyzed by gel 
electrophoresis and ethidium bromide staining or autoradiography. A side by side comparison of 
fingerprints obtained from for example tumor versus normal tissue samples using the same oligonucleotide 
primers identifies mRNAs that are differentially expressed. 

RNA fingerprinting technology has been demonstrated as being effective in identifying genes that 
are differentially expressed in cancer (Liang et a/.. 1992; Wong et a/.. 1993; Sager et at., 1993; Mok et 
at.. 1994; Watson et at.. 1994; Chen et at., 1995; An et at., 1995). The present invention utilizes the 
RNA fingerprinting technique to identify genes that are differentially expressed in diabetes. 

Design and Theoretical Considerations for Relative Quantitative RTPCR 

Reverse transcription <RT) of RNA to cDNA followed by relative quantitative PCR (RT PCR) can 

be used to determine the relative concentrations of specific mRNA species isolated from M0DY3, 
M0DY4, and M0DY1 patients. By determining that the concentration of a specific mRNA species varies, 
it is shown that the gene encoding the specific mRNA species is differentially expressed. This technique 
can be used to confirm that mRNA transcripts shown to be differentially regulated by RNA fingerprinting 
are differentially expressed in MODY related diabetes. 
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In PCR, the number of molecules of the amplified target ONA increase by a factor approaching 
two with every cycle of the reaction until some reagent becomes limiting. Thereafter, the rate of 
amplification becomes increasingly diminished until there is no increase in the amplified target between 
cycles. If a graph is plotted in which the cycle number is on the X axis and the log of the concentration of 
the amplified target DMA is on the Y axis, a curved line of characteristic shape is formed by connecting 
the plotted points. Beginning with the first cycle, the slope of the line is positive and constant. This is 
said to be the linear portion of the curve. After a reagent becomes limiting, the slope of the line begins to 
decrease and eventually becomes zero. At this point the concentration of the amplified target DMA 
becomes asymptotic to some fixed value. This is said to be the plateau portion of the curve. 

The concentration of the target DNA in the linear portion of the PCR amplification is directly 
proportional to the starting concentration of the target before the reaction began. By determining the 
concentration of the amplified products of the target DNA in PCR reactions that have completed the same 
number of cycles and are in their linear ranges, it is possible to determine the relative concentrations of 
the specific target sequence in the original DNA mixture. If the DNA mixtures are cDNAs synthesized 
from RNAs isolated from different tissues or cells, the relative abundances of the specific mRNA from 
which the target sequence was derived can be determined for the respective tissues or cells. This direct 
proportionality between the concentration of the PCR products and the relative mRNA abundances is only 
true in the linear range of the PCR reaction. 

The final concentration of the target DNA in the plateau portion of the curve is determined by the 
availability of reagents in the reaction mix and is independent of the original concentration of target DNA. 
Therefore, the first condition that must be met before the relative abundances of a mRNA species can be 
determined by RT-PCR for a collection of RNA populations is that the concentrations of the amplified PCR 
products must be sampled when the PCR reactions are in the linear portion of their curves. 

The second condition that must be met for an RT-PCR experiment to successfully determine the 
relative abundances of a particular mRNA species is that relative concentrations of the amplifiable cDNAs 
must be normalized to some independent standard. The goal of an RT-PCR experiment is to determine the 
abundance of a particular mRNA species relative to the average abundance of all mRNA species in the 
sample. In the experiments described below, mRNAs for p-actin, asparagine synthetase and lipocortin II 
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were used as external and internal standards to which the relative abundance of other mRNAs are 
compared. 

Most protocols for competitive PCR utilize internal PCR standards that are approximately as 
abundant as the target. These strategies are effective if the products of the PCR amplifications are 
sampled during their linear phases. If the products are sampled when the reactions are approaching the 
plateau phase, then the less abundant product becomes relatively over represented. Comparisons of 
relative abundances made for many different RNA samples, such as is the case when examining RNA 
samples for differential expression, become distorted in such a way as to make differences in relative 
abundances of RNAs appear less than they actually are. This is not a significant problem if the internal 
standard is much more abundant than the target. If the internal standard is more abundant than the 
target, then direct linear comparisons can be made between RNA samples. 

The above discussion describes theoretical considerations for an RT-PCR assay for clinically 
derived materials. The problems inherent in clinical samples are that they are of variable quantity (making 
normalization problematic), and that they are of variable quality (necessitating the co-amplification of a 
reliable internal control, preferably of larger size than the target). Both of these problems are overcome 
if the RT-PCR is performed as a relative quantitative RT-PCR with an internal standard in which the 
internal standard is an amplifiable cDNA fragment that is larger than the target cDNA fragment and in 
which the abundance of the mRNA encoding the internal standard is roughly 5-100 fold higher than the 
mRNA encoding the target. This assay measures relative abundance, not absolute abundance of the 

respective mRNA species. 

Other studies may be performed using a more conventional relative quantitative RT-PCR assay 
with an external standard protocol. These assays sample the PCR products in the linear portion of their 
amplification curves. The number of PCR cycles that are optimal for sampling must be empirically 
determined for each target cDNA fragment. In addition, the reverse transcriptase products of each RNA 
population isolated from the various tissue samples must be carefully normalized for equal concentrations 
of amplifiable cDNAs. This consideration is very important since the assay measures absolute mRNA 
abundance. Absolute mRNA abundance can be used as a measure of differential gene expression only in 
normalized samples. While empirical determination of the linear range of the amplification curve and 
normalization of cDNA preparations are tedious and time consuming processes, the resulting RT-PCR 
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assays can be superior to those derived from the relative quantitative RT-PCR assay with an internal 
standard. 

One reason for this advantage is that without the internal standard/competitor, all of the 
reagents can be converted into a single PGR product in the linear range of the amplification curve thus 
-ncreasing the sensitivity of the assay. Another reason is that with only one PGR product, display of the 
product on an electrophoretic gel or another display method becomes less complex, has less background 
and is easier to interpret. 

M. Methods for Activation of Gene Expression 

In one embodiment of the present invention, there are provided methods for the increased gene 
expression or activation in a cell. This is particularly useful where there is an aberration in the gene product 
or gene expression is not sufficient for normal function. This will allow for the alleviation of symptoms of 
M0DY3 typediabetesexperiencedas a result of mutationin HNFIa. M0DY4 type diabetes experienced as a 
result of mutation in HAIF1 p and M0DY1 type diabetes experiencedas a result of mutation in HNF4a. 

The general approach to increasing gene expression as mediated by HNF1a, HNFip or HNF4a 
according to the present invention, will be to provide a cell with an HNFIa, HNFip or HNF4a polypeptide, 
thereby permitting the transcription promotional activity of HNFIa, HNFip or HNF4a to take effect. While 
it is conceivable that the protein may be delivered directly, a preferred embodiment involves providing a 
nucleic acid encoding an HNFIa, HNFip or HNF4a polypeptide, U. an HNFIa, HNFip or HNF4a gene, to 
the cell. Followingthis provision, the HNFIa HNFip or HNF4a polypeptide is synthesizedby the host cell's 
transcriptionaland translational machinery, as well as any that may be provided by the expression construct. 
Cs-acting regulatory elements necessary to support the expression of the HNFIa HNFip or HNF4a gene 
will be provided, in the form of an expression construct. It also is possible that, expression of the viraJly- 
encoded HNFIa, HNFip or HNF4a could be stimulated or enhanced, or the expressed polypeptide 
stabdized, thereby achieving the same or similar effect. 

In order to effect expression of constructs encoding HNFIa, HNFip or HNF4a genes, the 
expression construct must be delivered into a cell. One mechanism for delivery is via viral infection, where 
the expression construct is encapsidated in a viral particle which will deliver either e replicating or non- 
replicating nucleic acid. In certain embodiments an HSV vector is used, although virtually any vector would 
suffice. 
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Several non-viral methods for the transfer of expression constructs into cultured mammalian cells 
also are contemplated by the present invention. These include calcium phosphate precipitation (Graham and 
Van Der Eb, 1973; Chen and Okayama, 1987; Rippe et at.. 1990) DEAE-dextran (Gopal. 1985), 
electroporation(Tur-Kaspasf a/., 1986; Potter* at., 1984), direct microinjection (Harland and Weintraub, 
1985). DNA-loaded liposomes (Nicolau and Sene, 1982; Fraley et at., 1979) and Iipofectamine-DNA 
complexes, cell sonication(Fechheimer*r at., 1987). gene bombardment using high velocity microprojectiles 
(Yang et. at., 1990). and receptor-mediated transf ection (Wu and Wu. 1987; Wu and Wu, 1988). Some of 
these techniques may be successfully adapted for in vivo or ex vivo use, as discussed below. 

In another embodiment of the invention, the expression construct may simply consist of naked 
recombinant DNA or plasmids. Transfer of the construct may be performed by any of the methods 
mentioned above which physically or chemically permeabilize the cell membrane. This is particularly 
applicable for transfer in vitro, but it may be applied to in vivo use as well. Another embodiment of the 
invention for transferring a naked DNA expression construct into cells may involve particle bombardment. 
This method depends on the ability to accelerate DNA coated microprojectiles to a high velocity allowing 
them to pierce cell membranes and enter cells without killing them (Klein et at.. 1987). Several devices for 
accelerating small particles have been developed. One such device relies on a high voltage discharge to 
generate an electrical current, which in turn provides the motive force (Yang et at.. 1990). The 
microprojectilesused have consisted of biologically inert substances such as tungsten or gold beads. 

In a further embodiment of the invention, the expression construct may be entrapped in a liposome. 
Liposomes are vesicular structures characterized by a phospholipid bilayer membrane and an inner aqueous 
medium. Multilamellar liposomes have multiple lipid layers separated by aqueous medium. They form 
spontaneously when phospholipids are suspended in an excess of aqueous solution. The lipid components 
undergo self rearrangement before the formation of closed structures and entrap water and dissolved 
solutes between the lipid bilayers (Ghosh and Bachhawat. 1 991 ). Also contemplated are Iipofectamine-DNA 
complexes. 

liposome-mediated nucleic acid delivery and expression of foreign DNA in vitro has been very 
successful. Wong etal. (1980) demonstrated the feasibility of liposome-mediated delivery and expression of 
foreign DNA in cultured chick embryo. HeLa and hepatoma cells. In certain embodiments of the invention, 
the liposome may be complexed with a hemagglutinating virus (HVJ). This has been shown to facilitate 



WO 98/11254 

PCT/US97/16037 

86 

fusion with the cell membrane and promote cell entry of liposome-encapsulatedDNA (Kaneda etaL. 1989, 
In other embodiments, the liposome may be complexed or employed in conjunction with nuclear non-histone 
chromosomal proteins (HMG-1) (Kato et a,.. 1991,. In yet further embodiments, the liposome may be 
complexedoremployedinconjunctionwithbothHVJandHMG-1. In other embodiments, the delivery vehicle 
may comprise a ligand and a liposome. Where a bacterial promoter is employed in the DNA construct, it also 
will be desirable to include within the liposome an appropriate bacterial polymerase. 

Other expression constructs which can be employed to deliver a nucleic acid encoding an HNFIa 
HNF1|3,orHNF4atransgeneintocellsarereceptor-mediateddeliveryvehic^ These take advantage of the 
selective uptake of macromolecules by receptor-mediated endocytosis in almost all eukaryotic cells 

Because of the cell type-specific distribution of various receptors, the delivery can be highly specif ic (Wu and 
Wu, 1993,. 

Receptor-mediated gene targeting vehicles generally consist of two components: a cell receptor- 
specific ligand and a DNA-binding agent. Several ligands have been used for receptor-mediated gene 
transfer. The most extensively characterized ligands are asialoorosomucoidlASOR, (Wu and Wu 1987) and 
transferrin (Wagner et a,.. 1990). Recently, a synthetic neoglycoprotein. which recognizes the same 
receptor as ASOR, has been used as a gene delivery vehicle (Ferkol et a/., 1993; Perales et a/., 1994). 
Mannose can be used to target the mannose receptor on liver cells. Also, antibodies to CD5 (CLL), CD22 
(lymphoma,. CD25 (T-cell leukemia, and MAA (melanoma, can similarly be used as targeting moieties. In 
other embodiments, the delivery vehicle may comprise a ligand and a liposome. 

Primary mammalian cell cultures may be prepared in various ways. In order for the cells to be kept 
viable while in vitro and in contact with the expression construct, it is necessary to ensure that the cells 
maintain contact with the correct ratio of oxygen and carbon dioxide and nutrients but are protected from 
microbial contamination. Cell culture techniques are well documented and are disclosed herein by reference 
(Freshner, 1992). 

One embodiment of the foregoing involves the use of gene transfer to immortalize cells for the 
production of proteins. The gene for the protein of interest may be transferred as described above into 
appropriate host cells followed by culture of cells under the appropriate conditions: The gene for virtually 
any polypeptide may be employed in this manner. The generation of recombinant expression vectors, and 
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the elements included therein, are discussed above. Alternatively, the protein to be produced may be an 
endogenous protein normally synthesized by the cell in question. 

Examples of useful mammalian host cell lines are Vero and HeLa cells and cell lines of Chinese 
hamster ovary, W138, BHK, COS-7. 293, HepG2, NIH3T3, RIN and MOCK cells, in addition, a host cell 
strain may be chosen that modulates the expression of the inserted sequences, or modifies and process 
the gene product in the manner desired. Such modifications [e.g., glycosylate) and processing [e.g., 
cleavage) of protein products may be important for the function of the protein. Different host cells have 
characteristic and specific mechanisms for the post-translational processing and modification of proteins. 
Appropriate cell lines or host systems can be chosen to insure the correct modification and processing of 

the foreign protein expressed. 

A number of selection systems may be used including, but not limited to. HSV thymidine kinase, 
hypoxanthine-guanine phosphoribosyltransferase and adenine phosphoribosyltransferase genes, in tk-. 
hgprt- or aprt- cells, respectively. Also, anti -metabolite resistance can be used as the basis of selection 
for dhfr. that confers resistance to; gpt. that confers resistance to mycophenolic acid; neo, that confers 
resistance to the aminoglycoside G418; and hygro, that confers resistance to hygromycin. 

Animal cells can be propagated in vitro in two modes: as non-anchorage dependent cells growing 
in suspension throughout the bulk of the culture or as anchorage-dependent cells requiring attachment to 
a solid substrate for their propagation V.e.. a monolayer type of cell growth). 

Non-anchorage dependent or suspension cultures from continuous established cell lines are the 
most widely used means of large scale production of cells and cell products. However, suspension 
cultured cells have limitations, such as tumorigenic potential and lower protein production than adherent 
cells. 

Large scale suspension culture of mammalian cells in stirred tanks is a common method for 
production of recombinant proteins. Two suspension culture reactor designs are in wide use - the stirred 
reactor and the airlift reactor. The stirred design has successfully been used on an 8000 liter capacity 
for the production of interferon. Cells are grown in a stainless steel tank with a height-to-diameter ratio 
of 1:1 to 3:1. The culture is usually mixed with one or more agitators, based on bladed disks or marine 
propeller patterns. Agitator systems offering less shear forces than blades have been described. 
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Agitation may be driven either directly or indirectly by magnetically coupled drives. Indirect drives reduce 

the risk of microbial contamination through seals on stirrer shafts. 

The airlift reactor, also initially described for microbial fermentation and later adapted for 
mammalian culture, relies on a gas stream to both mix and oxygenate the culture. The gas stream enters 
a riser section of the reactor and drives circulation. Gas disengages at the culture surface, causing 
denser liquid free of gas bubbles to travel downward in the downcomer section of the reactor. The main 
advantage of this design is the simplicity and lack of need for mechanical mixing. Typically, the height-to- 
diameter ratio is 10:1. The airlift reactor scales up relatively easily, has good mass transfer of gases and 
generates relatively low shear forces. 

N- Methods for Blocking Mutant HNFIct, HNFip and HNF4ot Action 

In another embodiment of the present invention, there is contemplated the method of blocking the 
function of mutated HNFIct in M0DY3. HNFip in M0DY4, and HNF4« in M00Y1. In this way. it may be 
possible to curtail the effects of the mutation in diabetes. In addition, it may prove effective to use this 
sort of therapeutic intervention in combination with more traditional diabetes therapies, such as the 
administration of insulin. 

The general form that this aspect of the invention will take is the provision, to a cell, of an agent 
that will inhibit mutated HNFIct, HNFip or HNF4a function. Four such agents are contemplated. First, 
one may employ an antisense nucleic acid that will hybridize either to the mutated HNFIct, HNFip or 
HNF4a gene or the mutated HNFIct, HNFip or HNF4ct gene transcript, thereby preventing transcription 
or translation, respectively. The considerations relevant to the design of antisense constructs have been 
presented above. Second, one may utilize a mutated HNFIct-, HNFip- or HNF4ct-binding protein or 
peptide, for example, a peptidomimetic or an antibody that binds immunologically to a mutated HNFIct, 
HNFip or HNF4a respectively, the binding of either will block or reduce the activity of the mutated 
HNF1a, HNFip and HNF4<x respectively. The methods of making and selecting peptide binding partners 
and antibodies are well known to those of skill in the art. Third, one may provide to the cell an antagonist 
of mutated HNFIct, HNFip or HNF4ct, for example, the transactivation target sequence, alone or coupled 
to another agent. And fourth, one may provide an agent that binds to the mutated HNFIct, HNFip or 
HNF4a target without the same functional result as would arise with mutated HNFIct, HNF1 p or HNF4a 
binding. 
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Provision of an HNFIa, HNFip or HNF4a gene, a mutated HNFIa, HNFip or HNF4a protein, or 
a mutated HNFIa, HNFip or HNF4a antagonist, would be according to any appropriate pharmaceutical 
route. The formulation of such compositions and their delivery to tissues is discussed below. The method 
by which the nucleic acid, protein or chemical is transferred, along with the preferred delivery route, will be 
selected based on the particular site to be treated. Those of skill in the art are capable of determining the 
most appropriate methods based on the relevant clinical considerations. 

Many of the gene transfer techniques that generally are applied in vitro can be adapted for ex 
vivo or in vivo use. For example, selected organs including the liver, skin, and muscle tissue of rats and 
micehavebeenbombardedw^CYangera/.. 1990;Zeleninef at.. 1991). Naked DMA also has been used 
in clinical settings to effect gene therapy. These approaches may require surgical exposure of the target 
tissue or direct target tissue injection. Nicolau et at. 11987) accomplished successful liposomemediated 
gene transfer in rats after intravenous injection. 

Dubensky et at. (1984) successfully injected polyomavirus DNA in the form of CaPO, precipitates 
into liver and spleen of adult and newborn mice demonstrating active viral replication and acute infection. 
Benvenisty and Neshif (1986) also demonstrated that direct intraperitoneal injection of CaP0 4 precipitated 
plasmids results in expression of the transfected genes. Thus, it is envisioned that DNA encoding an 
antisense construct also may be transferred in a similar manner//? vivo. 

Where the embodiment involves the use of an antibody that recognizes a mutated HNFIa, HNFip 
or HNF4a polypeptide, consideration must be given to the mechanism by which the antibody is introduced 
into the cell cytoplasm. This can be accomplished, for example, by providing an expression construct that 
encodes a single-chain antibody version of the antibody to be provided. Most of the discussion above 
relating to expression constructs for antisense versions of HNFIa, HNFip or HNF4a genes will be relevant 
to this aspect of the invention. Alternatively, it is possible to present a bif unctional antibody, where one 
antigen binding arm of the antibody recognizes an HNFIa. HNFip or HNF4a polypeptide and the other 
antigen binding arm recognizes a receptor on the surface of the cell to be targeted. Examples of suitable 
receptors would be an HSV glycoprotein such as gB, gC, gD. or gH. In addition, it may be possible to exploit 
the Fc-binding function associated with HSV gE, thereby obviating the need to sacrifice one arm of the 
antibody for purposes of cell targeting. 

Advantageously.one may combine this approach with more conventional diabetes therapy options. 
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0. Pharmaceuticals and In vivo Methods for the Treatment of Disease 

Aqueous pharmaceutical compositions of the present invention will have an effective amount of 
an HNFIa. HNFip or HNF4a expression construct, an antisense HNFIa, HNFip or HNF4a expression 
construct, an expression construct that encodes a therapeutic gene along with HNFIa, HNFip or 
HNF4a, a protein or compound that inhibits mutated HNFIa, HNFtp or HNF4a function respectively, 
such as an anti-mutant HNFIa antibody, an anti-mutant HNFip antibody or an anti-mutant HNF4a 
antibody, or a mutated HNFIa polypeptide, mutated HNFip polypeptide or a mutated HNF4a 
polypeptide. Such compositions generally will be dissolved or dispersed in a pharmaceutical^/ acceptable 
carrier or aqueous medium. An "effective amount," for the purposes of therapy, is defined at that amount 
that causes a clinically measurable difference in the condition of the subject. This amount will vary 
depending on the substance, the condition of the patient, the type of treatment, the location of the lesion, 
etc. 

The phrases "pharmaceutically or pharmacologically acceptable" refer to molecular entities and 
compositions that do not produce an adverse, allergic or other untoward reaction when administered to 
an animal, or human, as appropriate. As used herein, "pharmaceutically acceptable carrier" includes any 
and all solvents, dispersion media, coatings, antibacterial and antifungal agents, isotonic and absorption 
delaying agents and the like. The use of such media and agents for pharmaceutically active substances is 
well known in the art. Except insofar as any conventional media or agent is incompatible with the active 
ingredients, its use in the therapeutic compositions is contemplated. Supplementary active ingredients, 
such as other anti-diabetic agents, can also be incorporated into the compositions. 

In addition to the compounds formulated for parenteral administration, such as those for 
intravenous or intramuscular injection, other pharmaceutically acceptable forms include, e.g., tablets or 
other solids for oral administration; time release capsules; and any other form currently used, including 
cremes, lotions, mouthwashes, inhalants and the like. 

The active compounds of the present invention will often be formulated for parenteral 
administration, e.g., formulated for injection via the intravenous, intramuscular, subcutaneous, or even 
intraperitoneal routes. The preparation of an aqueous composition that contains mutated HNFIa, HNF1 p 
or HNF4a inhibitory compounds alone or in combination with a conventional diabetes therapy agents as 
active ingredients will be known to those of skill in the art in light of the present disclosure. Typically, 
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such compositions can be prepared as injectables, either as liquid solutions or suspensions; solid forms 
suitable for using to prepare solutions or suspensions upon the addition of a liquid prior to injection can 
also be prepared; and the preparations can also be emulsified. 

Solutions of the active compounds as free base or pharmacologically acceptable salts can be 
prepared in water suitably mixed with a surfactant, such as hydroxypropylcellulose. Dispersions can also 
be prepared in glycerol, liquid polyethylene glycols, and mixtures thereof and in oils. Under ordinary 
conditions of storage and use, these preparations contain a preservative to prevent the growth of 
microorganisms. 

The pharmaceutical forms suitable for injectable use include sterile aqueous solutions or 
dispersions; formulations including sesame oil, peanut oil or aqueous propylene glycol; and sterile powders 
for the extemporaneous preparation of sterile injectable solutions or dispersions. In many cases, the form 
must be sterile and must be fluid to the extent that easy syringability exists. It must be stable under the 
conditions of manufacture and storage and must be preserved against the contaminating action of 
microorganisms, such as bacteria and fungi. 

The active compounds may be formulated into a composition in a neutral or salt form. 
Pharmaceutically acceptable salts, include the acid addition salts (formed with the free amino groups of 
the protein) and which are formed with inorganic acids such as. for example, hydrochloric or phosphoric 
acids, or such organic acids as acetic, oxalic, tartaric, mandelic, and the like. Salts formed with the free 
carboxyl groups can also be derived from inorganic bases such as. for example, sodium, potassium, 
ammonium, calcium, or ferric hydroxides, and such organic bases as isopropylamine, trimethylamine, 

histidine, procaine and the like. 

The carrier also can be a solvent or dispersion medium containing, for example, water, ethanol, 
polyol (for example, glycerol, propylene glycol, and liquid polyethylene glycol, and the like), suitable 
mixtures thereof, and vegetable oils. The proper fluidity can be maintained, for example, by the use of a 
coating, such as lecithin, by the maintenance of the required particle size in the case of dispersion and by 
the use of surfactants. The prevention of the action of microorganisms can be brought about by various 
antibacterial and antifungal agents, for example, parabens. chlorobutanol, phenol, sorbic acid, thimerosal, 
and the like. In many cases, it will be preferable to include isotonic agents, for example, sugars or sodium 
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chloride. Prolonged absorption of the injectable compositions can be brought about by the use in the 
compositions of agents delaying absorption, for example, aluminum monostearate and gelatin. 

Sterile injectable solutions are prepared by incorporating the active compounds in the required 
amount in the appropriate solvent with various of the other ingredients enumerated above, as required 
followed by filtered sterilization. Generally, dispersions are prepared by incorporating the various 
stenhzed active ingredients into a sterile vehicle which contains the basic dispersion medium and the 
required other ingredients from those enumerated above. In the case of sterile powders for the 
preparation of sterile injectable solutions, the preferred methods of preparation are vacuum-drying and 
freeze-drying techniques which yield a powder of the active ingredient p.us any additional desired 
ingredient from a previously sterile-filtered solution thereof. 

Upon formulation, solutions will be administered in a manner compatible with the dosage 
formulation and in such amount as is therapeutically effective. The formulations are easily administered 
in a variety of dosage forms, such as the type of injectable solutions described above, with even drug 
release capsules and the like being employable. 

For parenteral administration in an aqueous solution, for example, the solution should be suitably 
buffered if necessary and the liquid diluent first rendered isotonic with sufficient saline or glucose. These 
particular aqueous solutions are especially suitable for intravenous, intramuscular, subcutaneous and 
intraperitoneal administration. In this connection, sterile aqueous media which can be employed will be 
known to those of skill in the art in light of the present disclosure. For example, one dosage could be 
dissolved in 1 mL of isotonic NaCI solution and either added to 1000 mL of hypodermoclysis fluid or 
injected at the proposed site of infusion; (see for example, "Remington's Pharmaceutical Sciences" 15th 
Edition, pages 1035-1038 and 1570-1580). Some variation in dosage will necessarily occur depending on 
the condition of the subject being treated. The person responsible for administration will, in any event, 
determine the appropriate dose for the individual subject. 
P- Examples 

The following examples are included to demonstrate preferred embodiments of the invention. It 
should be appreciated by those of skill in the art that the techniques disclosed in the examples which 
follow represent techniques discovered by the inventor to function well in the practice of the invention, 
and thus can be considered to constitute preferred modes for its practice. However, those of skill in the 
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art should, in light of the present disclosure, appreciate that many changes can be made in the specific 
embodiments which are disclosed and still obtain a like or similar result without departing from the spirit 
and scope of the invention. 

EXAMPLE 1 

5 Altered Insulin Secretory Responses To Glucose In Diabetic And Nondiabetic Subjects With 
Mutations In The Diabetes Mellitus Susceptibility Gene M0DY3 On Chromosome 1 Z 

The present Example determines whether alterations in the dose-response relationships between 

plasma glucose concentration and insulin secretion rate (ISR) can be identified in subjects who have 
inherited an at-risk M0DY3 allele but who have not yet developed overt diabetes. 

10 1. Methods 

Subjects from M0DY3 pedigrees 
Thirteen Caucasian subjects who were positive for M0DY3 markers on chromosome 12q were 
studied. Two subjects were members of a French pedigree F549 (Vaxillaire et */.,1995), three were from 
the P pedigree from Michigan (Menzel et ai. 1995), two from a New York pedigree the H pedigree 

15 depicted in FIG. 1. two were from a Liverpool pedigree, the BDA1 pedigree and four from a Nottingham 
pedigree, the BDA12 pedigree (FIG. 1). Each subject was typed with a series of DNA markers in the 
region of M0DY3 to determine whether or not they had inherited the at-risk haplotype segregating with 
MODY in that family. The diabetes status of each subject except for MD1 3, had been determined by oral 
glucose tolerance testing (OGTT) according to the World Health Organization (WHO) criteria {WHO Study 

20 Group on Diabetes Mellitus, 1985) and confirmed at the time of the studies by the measurement of 
glycosylated hemoglobin. Besed on the results of the OGTT and glycosylated hemoglobin values within or 
above the normal range for the inventors* laboratory «7.4%) subjects were divided into diabetic and 

nondiabetic groups. 

Nondiabetic M0DY3 subjects fn=6). 
25 The clinical profiles of these subjects are described in Table 4. All had normal fasting glucose and 

glycosylated hemoglobin « 7.4%) levels at the time of this study. At the time of study 4 subjects had 
IGT (MD1, MD4, MD9, MD13) and 2 subjects had normal glucose tolerance (NGT) (MD3, MD5). Based on 
previous glucose tolerance testing MD1 had IGT, MD3 consistently demonstrated NGT on serial OGTTs, 
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MD4 was dia B n M ed „i,h I6T h 6/93 and has persist, IGT with a » p.s,ps wd jal blood 9 l„c.se M 
af 147 mg/dl. MOS was initially diagnosad with IGT and seemly had 2 normal OGTTs with 2h 
bleed otacos. values .1 130 m 0( dl and 105 mg,dl, ,esp,cti».lv. MOO bad IGT. with a » post-cha^ 
blood glueose level was 167 mg,dl with no ntha, Wood glacoso level ah.»e 200 mg/dl and MD13 had IGT 
wnh el„e,nd postprandial hlood glncae levels in ,he pas, op ,„ , 60 mg/dl . A „ e of te(ers ,„ 

the age a, which abnormal gtocasa tolerance was diagnosed. None ol these subjects were eve, 
diagnosed with NIDDM. 

Diabetic M0DY3 subjects (n=7). 
Clinical profiles are shown in Table 4. All subjects had been treated with oral hypoglycemic 
agents except for MD8 who was taking insulin which was discontinued two days prior to the study and 
MD12 who was treated with diet alone. All subjects had discontinued treatment with oral hypoglycemic 
agents at least three weeks prior to being studied. As shown in Table 4. fasting plasma glucose and total 
glycated hemoglobin levels were higher in the diabetic group and fasting insulin leve.s were lower 
The diabetic group was also significantly older than the other two groups. 
Nondiabetic controls. 

The control subjects consisted of 5 males and one female (5 Caucasians and 1 African American, 
who did not have a personal or family history of NIDDM. They were all within 20% of ideal body weight 
had no medical illnesses and were not receiving any medications. Data from four of the control subjects 
have previously been published (Byrne et el.. 1994; Byrne et a,., 1995a,. BMI was not significantly 
different between the control and diabetic or nondiabetic M0DY3 groups. 

Female volunteers had regular menstrual cycles and were studied only in the early follicular 
Phase. The study was approved by the Institutional Review Board of the University of Chicago Medical 
Center and all subjects and/or parents provided written informed consent. 
Experimental protocol 
Studies began at 0800 h with subjects in the recumbent position after a 12-h overnight fast An 
mtravenous catheter was placed in each forearm, one for blood sampling and one for glucose 
admiration. In all experiments, the arm containing the sampling catheter was maintained in a heating - 
blanket or hot hand box to ensure arterialization of the venous sample. 
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Graded glucose infusion studies. 
These studies were designed to characterize the dose-response relationships between glucose 
and insulin secretion rate HSR). In order to eliminate potentially confounding effects of differences in the 
basal glucose concentration, each study began with the administration of a small bolus of insulin 
intravenously (0.007 U/kg) followed by a low dose continuous infusion of insulin to lower the fasting 
plasma glucose to similar levels in all groups (target plasma glucose - 5 mM). After a period of 20 min 
during which time the exogenously administered insulin was allowed to decay, samples were drawn at 10 
min intervals for 30 min to define baseline insulin, glucose and C-peptide levels. An intravenous infusion 
of 20% dextrose was then started at a rate of 1 mg/kg/min. followed by infusions of 2 mgfkg/min, 3 
mglkglmin. 4 mglkglmin, 6 mg/kglmin and 8 mg/kglmin. Each infusion rate was administered for a period 
of 40 min. Insulin. C-peptide and glucose concentrations were measured at 10. 20, 30 and 40 min into 
each infusion period. 

Effects of prolonged intravenous glucose administration on insulin secretory responses to 

graded glucose infusions. 

At the completion of the graded glucose infusion study described above, glucose was infused 
intravenously for a 42-h period at a rate of 4-6 mg/kglmin in order to determine if the insulin secretory 
responses to glucose could be primed by exposure to mild hyperglycemia. Subjects also consumed three 
carbohydrate enriched meals during the second day of this glucose infusion. At the conclusion of the 42- 
h infusion period, the infusion rate was reduced over a 60 min period and then stopped. Thirty minutes 
later, the graded glucose infusion study was repeated. Plasma glucose levels were obtained every four 
hours during the 42-h glucose infusion. 
Assays. 

Plasma glucose was measured by the glucose oxidase technique (YSI analyzer. Yellow Springs. 
OH). The coefficient of variation of this method is <2%. Serum insulin was assayed by a double 
antibody technique (Morgan and Lazarow. 1963). The average intra-assay coefficient of variation was 
6%. Plasma C-peptide was measured as previously described (Faber at al.. 1978). The lower limit of 
sensitivity of the assay was 0.02 pmollml and the intra-assay coefficient of variation averaged 6%. All 
samples were measured in duplicate. Assays were performed at the University of Chicago. 
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Data analysis 

Estimation of ISRs. ISRs were derived by deconvolution of plasma C-peptide concentrations 
assuming a two-compartmental model of C-peptide clearance kinetics (Van Cauter et a/.. 1992; Eaton et 
a/., 1980; Polonskyrt a/.. 1986). 

Relationship between glucose and ISRs. 
The relationship between plasma glucose and ISR was explored in each individual by analyzing 
the data from the graded glucose infusion studies. Baseline glucose, insulin, C-peptide and ISRs were 
calculated as the man of the values in the -30, -20, -10 and 0 min samples. During each glucose infusion 
period, average glucose and ISRs were calculated. Mean ISRs for each period were then plotted against 
the corresponding mean glucose level, thereby establishing a dose-response relationship between glucose 
and ISR. Mean ISRs were determined for 1 mM glucose concentration intervals by calculating the area 
under the curve for each interval using the trapezoidal rule. This area was divided by 1 mM to obtain the 
correct units fpmol/min). 
Statistical analyses 

All results are expressed as mean * SEM. Data analysis was performed using the Statistical 
Analysis System (SAS Version 6 Edition for Personal Computers, SAS Institute, Inc., Gary, NC). The 
significance of differences between the groups was determined using paired or unpaired Wests or 
analysis of variance where appropriate. Tukey's studentized range test was used for post hoc 
comparisons. Pearson's correlation coefficient was used to evaluate correlations between pairs of 
parameters. 

2. Results 

Glucose, insulin and ISR during graded intravenous glucose infusion 
Fasting plasma glucose levels were higher in the M0DY3 diabetic group compared to the 
nondiabetic group or controls (7.5±0.7 mM vs. 4.5±0.2 mM and 4.7*0.2. respectively; P> 0.0008). 
The corresponding fasting plasma insulin levels were lower in the diabetic M0DY3 group compared to 
nondiabetics and controls (Table 4). Glucose, insulin and ISR responses to the glucose infusions are 
shown in FIG. 2A, FIG. 2B and FIG. 2C, respectively. Average glucose concentrations over the duration of 
the study were higher in the diabetic M0DY3 subjects compared to the nondiabetic M0DY3 and control 
subjects |8.5±0.4 mM vs. 6.3 ± 0.3 mM and 64±0.2; P< 0.0002) (FIG. 2A). Average insulin levels were 



WO 98/11254 PCT/US97/16037 

97 

lower in the diabetic and nondiabetic M0DY3 groups than in the controls (57.4±8.2 pmol/L and 
79.8±11.0 vs. 139.3*14.7 pmol/L; /><0.0006) (FIG. 2B). Average ISR's were significantly lower in 
diabetic compared to the nondiabetic M0DY3 subjects and the controls (116±18.8 pmol/min vs. 
179.7±19.9 pmol/min and 1995±18.7; /><0.02)(FIG.2C). 
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TABLE 5 





Insulin Secreted between 5 and 9 nr 


ihA alucose 


ID 


Baseline 


Post-glucose 


Primino effect % 


Non-diabetic M0DY3 








MD1 ! 


188.1 


221.6 


17.9 


MD3 


164.5 


255 


55 


W5D4 


136.S 


208.3 | 


52.5 


MD5 


297.5 


342.5 


15.1 


MD9 


249.1 


292.1 


34.5 


MD13 


248.1 


234.2 


-5 9 


MEAN 


214.3±24.8 


259 ±20.6 [ 


35+8 










Diabetic WI0DY3 








MD2 


87.4 


68.9 


22 


MD6 


131.5 


109.1 


17 


MD7 


144.6 


85.2 


-41 


WID8 j 


156.6 


189.3 


20.9 


M10 


63.7 


34.9 


-45 


Mil 


38.2 


28.4 


-26 


Ml 2 


102.6 


115.1 


12 2 


MEAN 


I 100.8±17.3" 


90.0 ±20.8" 


! .114+98* 










Controls 








liUO 


318.1 


356.8 


12.2 


C07 


209.5 


272.1 


29.2 


cog 


166.9 


223.1 


33.7 


C12 


235.6 


381.6 


62.0 


C13 


215.6 


306.5 


42.2 


C18 


120.1 


180.5 


50.3 


MEAN 


211 ±27 


287 ±32 


38±7 


p value 


p< 0.004 


P< 0.002 


p< 0.009 



The amount of insulin secreted as glucose was raised from 5 to 9 mftfl in study subjects 
before and after a priming intravenous infusion of glucose. Asterisks refer to statistically 
significant differences between the diabetic subjects and those in the other two groups 
using Tukey's studentized range test for post-hoc comparisons. 
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Changes in insulin sensitivity 

Insulin resistance estimated by the Homeostasis Model Assessment Method (HOMA) (Matthews 

et aL. 1985) failed to demonstrate significant differences between the groups (diabetic M0DY3: 

1.9*0.2; nondiabetic M0DY3: 1.7±0.3; controls: 2.4+0.2; /'-O.1 1). 

Dose -response relationship between glucose and ISR 

The ISR in the three groups was compared at the same plasma glucose level by plotting the mean 

ISR at each glucose infusion rate against the corresponding mean glucose level. The resulting glucose-ISR 

dose-response relationships are shown in FIG. 3. Over the 5-9 mM glucose concentration interval the 
d,abet,c M0DY3 group secreted significantly less insulin than subjects in the nondiabetic M0DY3 and 
control groups (101*17 pmoi/min vs. 214*25 pmo./min and 211*27 pmoi/min, respectively; 
P< 0.004). The mean insulin secretion rate did not differ between these latter two groups. 

The dose response curves (FIG. 3) indicate that the insulin secretion rates were similar in 
nondiabetic MODY subjects and controls at lower glucose concentrations. The amount of insulin secreted 
as the glucose concentration was increased from 5-7 mM was similar in these two groups (180 + 19 vs 
160*17 pmoi/min; A>-0.45>. Over the 7-8 mM glucose interval the nondiabetic M0DY3 subjects 
secreted 243.5*31.5 pmoi/min compared to 284.7*30.5 pmoi/min in controls />-0.37. From 8-9 mM 
glucose they secreted 257.1 *35.0 pmoi/min compared to 354.0*43.4 pmoi/min in controls />=012 (FIG. 
3). As the glucose concentration was increased from 7-8 mM to 8-9 mM the increase in insulin secretion 
rate in the nondiabetic M0DY3 subjects was significantly less than in the controls (37.3*13.5 vs. 
75.7*9.5 pmoi/min; P< 0.05). 

Effect of low-dose glucose infusion on relationships between glucose and ISR 

Mean glucose levels achieved during the 42-h constant glucose infusion were significantly higher 
«n the diabetic compared to the nondiabetic M0DY3 group and controls (14.9*0.6 mM vs. 10.0* 1 4 mM 
vs. 6.6*0.3 mM; P< 0.0001). The glucose infusion was discontinued after 42-h and low dose insulin 
was administered resulting in a fall in the plasma glucose concentration to similar levels in the two 
groups. The graded intravenous glucose infusion study was then repeated in each subject. 

In order to quantify the priming effect of glucose on insulin secretion, the average ISR measured 
during each glucose infusion rate was plotted against the average plasma glucose concentration and 
compared with values obtained before glucose infusion. Over the glucose concentration range between 5 
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and 9 mM glucose, control subjects secreted 211 ±27 pmol/min before and 287 ±32 pmol/min 
{P< 0.005) insulin after glucose infusion (FIG. 4A). There was a shift in the glucose-ISR does-response 
curves upwards and to the left, with ISR increasing by 38 ±7%. The nondiabetic M0DY3 group increased 
their ISR from 214±25 pmol/min to 259±21 pmol/min (/><0.03) (FIG. 4B). The diabetic M0DY3 group 
5 had a small and non significant 13± 10% decrease in ISR after glucose administration {101 ± 17 pmol/min 
to 90±21 pmol/min; />>0.9) IFIG. 40. Individual values for ISR from 5-9 mM glucose before and after 
low-dose glucose infusion are given in Table 5. 

Relationship between glycosylated hemoglobin levels and parameters of the insulin secretory 
response to glucose 

10 There was a significant negative correlation between glycosylated hemoglobin and percent 

priming (r - -0.78; P< 0.002) and between glycosylated hemoglobin and ISR from 5-9 mM glucose |r = - 
0.61; / , <0.03). By contrast there was no significant decrease in ISR as glucose concentrations rose 
from 7 8 to 8-9 mM with increasing glycosylated hemoglobin levels (r = -0.07; /'-0.82). 
3. Discussion 

15 Basal glucose levels were higher and insulin levels were lower in M0DY3 subjects with diabetes 

compared to nondiabetic subjects or normal healthy controls. In response to the graded glucose infusion, 
insulin secretion rates were significantly lower in the diabetic subjects over a broad range of glucose 
concentrations. Insulin secretion rates in the nondiabetic M0DY3 subjects were not significantly 
different from the controls at plasma levels <8 mM. As glucose rose above this level, however, the 

20 increase in insulin secretion is these subjects was significantly reduced. Administration of glucose by 
intravenous infusion for 42-h resulted in a significant increase in the amount of insulin secreted over the 
5-9 mM glucose concentration range in the controls and nondiabetic M0DY3 subjects (by 38% and 35%, 
respectively) but no significant change was observed in the diabetic M0DY3 subjects. In conclusion, in 
nondiabetic M0DY3 subjects insulin secretion demonstrates a diminished ability to respond when blood 

25 glucose exceeds 8 mM. The priming effect of glucose on insulin secretion is preserved. Thus, (3-cell 
dysfunction is present prior to the onset of overt hyperglycemia in this form of MODY. The defect in 
insulin secretion in the nondiabetic M0DY3 subjects differ from than reported previously in nondiabetic 
M0DY1 or mildly diabetic M00Y2 subjects. 
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EXAMPLE 2 

Mutations in HNFIa Relating to M0DY3 Type Diabetes 
1. Materials and Methods 

Isolation of partial sequence of the human HNFIa. gene. 
The PAC clone, 254A7, containing the human HNFIa gene was isolated from a library (Genome 
Systems, St. Louis, MO) by screening PAC DMA pools with PCR and the primers HNF1P1 |5'- 
TACACCACTCTGGCAGCCACACT-3' SEQ ID N0:10) and HNF1P2 (5' CGGTGGGTACATTGGTGACAGAAC- 
3' SEQ ID N0:11>. The sequences of the exons and flanking introns were determined after subcloning 
fragments of the 254A7 into P GEM4Z (Promega Biotec, Madison, Wl) or pBluescript SK + (Stratagene, 
La Jolla, CA) and sequencing using primers based on the sequence of the human HNFIa cDNA (Bach et 
al., 1990; and Bach and Yaniv, 1993) and selected using the conserved exon-intron organization of the 
mouse and rat genes (Bach et a,.. 1992, as a guide. Sequencing was carried using a AmpliTaq FS Dye 
Termmator Cycle Sequening Kit (ABI, Foster City, CA, on an ABI Prism™ 377 DNA Sequencer (ABI, The 
sequences of the exon 2r,ntron 2, exon 3/intron 3. intron 6/exon 7. and intron 8/exon 9/intron 9 junctions 
were determmed by directly sequencing PCR products generated by amplification of PAC 254A7 or 
human genomic DNA. FIG. 1 1 shows the cDNA sequence of HNFIa. 

Screening of HNFIa gene for mutations. 
The ten exons and flanking introns of the HNFIa gene of an affected subject from families in which of 
MODY cosegregated with markers spanning the M0DY3 region of chromsome 12 subjects with the 
MODY3form of NIDDM were amplified using PCR and specific primers (Table 6). PCR conditions were 
denaturation at 94°C for 5 min following by 35 cycles of denotation at 94°C for 30 sec, annealing at 
62°C for 30 sec (except for exon 9 ■ annealing temperature was 60°C> and extension at 72°C for 45 sec 
and final extension at 72°C for 10 min. The PCR products were purified using a Centricon-1 00 membrane 
(Armcon, Beverly, MA) and sequenced from both ends using the primers shown in Table 6, a AmpliTaq FS 
Dye Terminator Cycle Sequencing Kit and ABI Prism™ 377 DNA Sequencer. The presence of the specific 
mutation in other family members was assessed by amplifying and directly sequencing the appropriate 
exon. At least 40 normal unrelated healthy non-diabetic non-Hispanic white subjects (80 chromosomes) 
were also similarly screened. DNA polymorphisms identified during the course of screening patients for 
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mutations were characterized by PCR and direct sequencing, or digestion with an appropriate restriction 
endonuclease and gel electrophoresis. 
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2 Results 

Table 7 identifies the DNA polymorphisms identified in the coding region of HNFIa gene. Of 
course these are exemplary polymorphisms and those of skill in the art will easily be able to employ the 
methods and descriptions set forth in the present invention to identify other polymorphisms. 

Table 7. 

DNA polymorphisms identified in coding region of human HNFIa gene 



Exon 


Codon 


Nucleotide change 


Frequency 


1 


17 


CTC(Leu)-»CTG (Leu) 


C. 0.57; G, 0.43 


1 


27 


ATC(lle)-*CTC (Leu) 


A, 0.63; C, 0.37 


1 


98 


CCC(Ala)->GTC (Val) 


C, 0.98; T.0.02 


4 


279 


GGG(Gly)->GGC (Gly) 


G. 0.69; C, 0.31 


7 


459 


CTG(Leu)->TTG (Leu) 


C, 0.63;T, 0.37 


7 


487 


AGC(Ser)->AAC (Asn) 


G, 0.68; C, 0.32 


8 


515 


ACG(Thr)->ACA(Thr) 


G, 0.79; A, 0.2 1 


Intron 1 


nt-91 


A->G 


A, 0.88; G, 0.12 


Intron 1 


nt-42 


G->A 


G, 0.66; A, 0.34 


Intron 2 


nt-51 


T->-A 


T. 0.85; A, 0.15 


Intron 2 


nt-23 


C-VT 


C. 0.88; T, 0.12 


Intron 5 


nt-47 


C-VT 


C, 0.99; T. 0.01 


Intron 7 


nt-7 


G->A 


G, 0.57; A. 0.43 


Intron 9 


nt-44 


C-»T 


C. 0.96; T. 0.04 


Intron 9 


nt-24 


T->C 


T, 0.59; C. 0.41 



Table 8 shows a summary of mutations identified in human HNFIa in patients with M0DY3. 
Sixteen exemplary mutations are identified in the HNMa gene in M00Y3 patients but were not present in 
unaffected individuals, these mutations include frameshiftsinexons 1,4. 6, and 9. missense coding in exons 
2. and 7 as well as abnromal splicing in introns 5 and 9. The results described herein demonstrate that 
mutations in this transcription factor can cause diabetes mellitus and focuses attention on the role of HNF- 
1 a in determining normal pancreatic p-cell function. 
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3. Discussion 

Linkage analysis localized MODY3 to a 10 cM interval of chromosome 12 between the markers 
D12S86 and D12S342 (Vaxillaire eta/., 1995) and then to a 5 cM interval between the markers D12S86 
and D12S807ID12S820 (Menzel. S. et al. 1995). A combined YAC, BAC and PAC contig spanning 
D12S86 and D12S807 (FIG. 9) was generated using information in public databases {Chumakov et at. 
1995; Hudson et al. 1995) and screening appropriate libraries (YAC and BAC, Research Genetics, 
Huntsville. Alabama; and PAC, Genome Systems. St. Louis. Missouri) with STSs from the M00Y3 region. 
The physical map allowed localization of new polymorphisms as they were reported as well as to 
generate new markers to further localize recombination events in key individuals. Such studies refined the 
localization of M0DY3 to the 3 cM interval between D12S1666 and the polymorphic STS UC-39. 
Fluorescence in situ chromosomal hybridization using the BAC 162B15 mapped the contig to chromosome 
band 12q24.2. 

This combination of genetic and physical mapping information was used to begin a systematic 
search for M0DY3. Using a combination of approaches including testing genes known to be on the long 
arm of chromosome 12 to see if they mapped into the contig. exon-trapping (Church, et al. 1994), and 
cDNA selection (Kaplan et al., 1992) using human pancreatic islet cDNA (clinical studies had shown that 
insulin secretion was abnormal in M0DY3 patients, and thus islets were a likely site of expression of 
M0DY3 mRNA and protein), the inventors identified 14 genes encoding known proteins (y-subunit of 
AMPactivated protein kinase, citron, the GTP-binding protein H-ray, paxillin, acidic ribosomal 
phosphoprotein PO, pancreatic phospholipase A2, splicing factor SRp30, cytochrome C oxidase subunit 
Via, short chain acyl CoA dehydrogenase. HNF-la, thyroid receptor interactor (TRIP14) protein, 
Ca 2 */calmodulhi-dependent protein kinase. P 2X 4 purinoceptor and restin), 5 pseudogenes 
(metallopanstimulin-like. cell surface heparin binding protein-like, ribosomal protein L12like. nucleoside 
diphosphate kinase like and ADP ribosylation factor like). 12 ESTs (yq81d09. yd50d03. IB383, hbc3028, 
yu36h05, yn75d09, yz51b06, yd88g07, ym03h09, ym30e05, Wl-6178lc-01h06, WI-6239/c-04b12) and 

9 unknown genes (FIG. 9). 

These genes were being systematically sequenced in affected and unaffected subjects using 
nested PCR and illegitimate transcription of lymphoblastoid RNA (Kaplan et al.. 1992), as well as PCR of 
individual exons of the gene. Comparison of the sequences of the pancreatic phospholipase A2, y-subunit 
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of AMP-activated protein kinase. H-ray, cytochrome C oxidase submit VIA, acidic ribosomal 
phosphoprotein PO, paxillin, splicing factor SRp30. short chain acyl CoA dehydrogenase, and P 2X4 
purinoceptor genes from patients and controls revealed a number of polymorphisms but no MODY* 
associated mutations. 

The HNF-la gene was localized in the interval containing M0DY3 using PCR and HNF-la gene- 
specific primers (FIG. 9,. HNF-la cDNAs were also isolated at high frequency by cONA selection from 
human pancreatic islet cONA using PAC 254A7, a result consistent with the report of Emens eta/. (1992) 
showing that HNF-la was expressed in hamster insulinoma cells and functioned as a weak transactivator 
of the rat insulin I gene. The human HNF-la gene was isolated and partiaHy sequenced to provide the 
exon-mtron organization and the sequences of introns from which primers could be selected for PCR The 
human gene consists of 10 exons with introns 1-8 located in the same positions as in the rat and mouse 
genes (Bach et a/., 1 992). Intron 9 interrupts codon 590 (phase 1 , and is not present in the rat and mouse 
genes but does occur in the chicken gene (Horlein eta,., 1993) consistent with loss of this intron during 
the period when humans and rodents shared their last common ancestor. Amplification and direct 
sequencing of exon 4 of subject EA1 (Edinburgh pedigree, FIG. 5A) showed an insertion of a C in codon 
289 (Pro) resulting in a frameshift and premature termination (designated P289fsinsC) (FIG. 10) This 
mutation was present in all affected members and no unaffected members of this family. It was also not 
found on screening 55 healthy non-diabetic white subjects (110 chromosomes,. Hence it was concluded 
that the HNF-la gene is M0DY3 and led the inventors to sequence the HNF-la gene in other families in 
which NIOOM cosegregated with markers from the M00Y3 region. 

Fifteen additional mutations were found (Table 8), all of which co-segregated with NIOOM, and 
did not occur in any of at least 50 healthy non-diabetic white subjects. However, there were individuals in 
several pedigrees (GK pedigree, 1113; Ber pedigree, V-2; and P pedigree, IV-5 and IV-6) who had inherited 
the mutant chromosome (and at-risk chromosome 12 haplotype) but who were non-diabetic or showed 
only evidence of impaired glucose intolerance or diabetes during pregnancy. These individuals will likely 
develop NIDOM in the future. In addition, one subject with NIOOM did not have the mutant allele (Ber 
pedigree, 11-2). He was diagnosed with NIDOM at 65 years of age at which time he was mildly obese with 
a body mass index of 27 kg/m 2 suggesting a diagnosis of late-onset NIDDM rather than MODY. Such 
heterogeneity within MODY families has been noted previously (Bell eta/. 1991; Vionnet 1992) and is due 



WO 98/11254 PCT/US97/16037 

109 

to the high frequency of late-onset NIDDM which affects 10% or more of individuals over age 65 years 
(Kenny et al.. 1995). In addition to the mutations listed in Table 8, three amino acid polymorphisms 
(I/L27, A/V98 and SJN487), four silent polymorphisms (in codons for L17. G288, L459 and T515) and 
seven polymorphisms in introns were found in the HNF- la gene (Tables 7 and 8). 

Sixteen different mutations in the HNMa gene were identified in patients with the M00Y3-form 
of diabetes. The splicing and frameshift mutations would be predicted to result in the expression of a 
truncated protein having at least amino acids 1-290 of the native protein. The missense mutations, 
R131Q and P447L, are of residues that are conserved in human, rat, mouse, hamster, chicken, Xenopus 
and salmon HNMa and the structurally-related transcription factor human HNF-1 (3 suggesting that these 
residues are functionally important. 

HNMa is one of a group of transcription factors expressed in liver that act together to confer 
tissue-specific expression of genes in this tissue (Tranche et al.. 1992; Bach et al.. 1990). It is also 
found in kidney, intestine, stomach and pancreas, including islets of Langerhans, and at low levels in 
spleen and testis suggesting that it plays a role in transcriptional regulation in these tissues as well. HNF- 
la is composed of three functional domains: an NH r terminal dimerization domain (amino acids 1-32), a 
DNA binding domain with POU-like and homeodomain-like motifs (amino acids 150-280) and a C00H- 
terminal transactivation domain (amino acids 281-631). The functional form of HNMa is a dimer and 
HNF-1 a may form homodimers or heterodimers with the structurally-related protein HNF-1 p (Mendel et 
at., 1991) 

Pontoglio et al. (1996) have generated mice that lack HNMa. Homozygous HNF- la-deficient 
animals failed to thrive and usually died around the time of weaning. They also suffered from 
phenylketonuria and renal tubular dysfunction. However, the homozygous HNF-1 a-deficient mice did not 
appear to be diabetic as they had normal blood glucose levels and a normal response to an intravenous 
bolus injection of glucose. The massive glucosuria in these animals though may have masked the presence 
of diabetes mellitus. The insulin secretory responses of heterozygous HNF- la-deficient mice, animals that 
may be most similar to human subjects with HNMa mutations and MODY, were not reported. In view of 
the present findings that mutations in the HNF- la gene causes early-onset NIDDM, more detailed 
evaluation of p-cell and liver function in HNF-1 a-deficient mice is indicated. 
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The mechanism by which mutations in the HNF-la gene when present on a single allele can cause 
diabetes is unclear however, it is possible that a partial deficiency of HNF- 1a could lead to p-cell 
dysfunction and diabetes. Alternatively, mutations in HNF- 1a may cause diabetes by a dominant- 
negative mechanism (Herskbwitz. 1987) by interfering with the function of wild-type HNF-la and other 
proteins which act in concert with HNF-la to regulate transcription in the p-cell and/or liver. All of the 
HNF-la gene mutations identified to date would result in the synthesis of a mutant protein impaired in 
DNA binding or transactivation but not dimerization. These mutant proteins could form non-productive 
dimers with the product of the normal HNF-la allele or other proteins such as HNF-lp and thereby impair 
the normal function of HNF-la. 

The inventors have previously shown that diabetes mellitus in the Zucker diabetic fatty rat, a 
rodent model of obesity and NIDDM, is associated with decreased expression of a large number of p-cell 
genes including genes such as insulin whose expression is restricted to the p-cell as well as others with 
a much broader tissue distribution (Tokuyama, eta/. 1995). Thus, it is believed that NIDDM is likely to be 
a disorder of transcription with genetic or acquired defects affecting key proteins that regulate 
transcription leading to p-cell dysfunction and diabetes. 

EXAMPLE 3 

Mutations in HNF4a Relating to MODY1 Type Diabetes 

The PAC clone, 1 14E13, 130B8, 207N8, containing the human HNF4a gene was isolated from a 
library (Genome Systems, St. Louis. MO) by screening PAC DNA pools with PCR and the primers HNF4P1 
(5'-CACCTGGTGATCACGTGGTC-3' SEO ID N0:81) and HNF4P2 |5'-GTAAGGCTCAAGTCATCTCC-3' SEQ 
ID N0:82). The sequences of the exons and flanking introns were determined by directly sequencing using 
primers based on the sequence of the human HNF4a cDNA (Chartier et a/., 1994; Drewes et a/., 1996) 
and selected using the conserved exon-intron organization of the mouse (Taraviras eta/, 1994) as a guide. 
Sequencing was carried using a AmpliTaq FS Dye Terminator Cycle Sequening Kit (ABI, Foster City, CA) 
on an ABI Prism TM 377 DNA Sequencer (ABI). 
Screening of HNF4a gene for mutations. 
The eleven exons and flanking introns of the HNF4a gene of an affected subject from families in 
which of MODY cosegregated with markers spanning the M0DY1 region of chromsome 20 subjects with 
the MODYI-form of NIDDM were amplified using PCR and specific primers (Table 9). PCR conditions 
were denaturation at 94°C for 5 mm following by 35 cycles of denaturation at 94°C for 30 sec. 
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anaeabn, .t 60°C for 30 sec and extension at 72°C fa, 30 sec and Anal extension a. 72'C fa. 10 mi, 
The PCR prodeots were pnrified .sin, a C«.,ncon.100 membrane (Andean. Beverly. MA. and seqnenced 
from Ml, ends asm, the primers shewn m Tahle 9. a AmpliTa, F S D»e Tenrnnate, Cycle Severn, Kr, 
and ABI Prism" 377 DNA Serpren«r. The presence of the specific mutarirm in .the, family members was 
assessed by dipestmn whh B«a3 restriction end.nada.se that resalted from mutation and p.. 
ekretrophoresrs. At least 100 mm* anramted healthy nmr di.be.ic nen-Hispanic white sr*iec,s 1200 
mmrn chromasonresl we,, mse sbnileriy screened. DMA palyme^lasms W dorin, the ceu,se af 
scroenmp patients fe, nutans were charocterired by PCR and dime, mmm* ., di fl .s«,en w,,h an 
appropriate restriction endonaclease and gel electrophoresis. 
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Table 10 identifies the DNA polymorphisms and mutations identified in the coding region of the 
HNF4cc gene. Of course, these are exemplary polymorphisms and those of skill in the art will easily be 
able to employ the methods and descriptions set forth in the present invention to identify other 
polymorphisms. FIG. 7 shows an alignment of the HNF4a protein sequence from humans with sequences 
from human mouse, X. Laves and Drosophila. The putative DNA binding sites are underlined and the 
putative iigand binding sites are in bold. The DNA sequences for exon 1, exon lb. exon 2, exon 3, exon 4, 
exon 5 exon 6 exon 7 exon 8 exon 9 and exon 10 of HNF4a are shown in FIG. 8A, FIG. 8B. FIG. 8C, FIG. 
8D FIG. 8E. FIG. 8F f FIG. 8G, FIG. 81, FIG. 8H, FIG. 81 and SEQ ID N0:34. SEQ ID N0:36, SEQ ID N0:38. 
SEQ ID N0:40, SEQ ID N0:42, SEQ ID N0:44, SEQ ID N0:46, SEQ ID N0:48. SEQ ID N0:50, SEQ ID 
N0:52, and SEQ ID N0:54, respectively. It is contemplated that mutations in any of these exons, or the 
related intron regions therebetween, of HNF4a will result in M0DY1 type diabetes. 

Table 10. 

Polymorphisms and Mutations in the Human HNF4a Gene 

Location Nucleotide change Frequency 

Exon Codon -— — - ^. nnr — 

4 ^30 ACT (Thr)ATT(lle) C:T- 105:5 C-0.95. T-0.05 

7 273 GATIAsp)-GAC(Asp) T:C-169:1 T 0.004, C-0.006 

7 268 GAG(Gln) TAG(stop) 0/216 control chromosomes 

The R-W pedigree, which includes more than 360 members spanning 6 generations and 74 
members with diabetes including those with MODY, has been studied prospectively since 1958 (Fajans, 
1989). The members of this family are descendants of a man who was born in East Prussia in 1809 and 
emigrated to Detroit, Michigan in 1861 with his four sons, three of whom were diabetic, and five 
daughters, one of whom was diabetic (Fajans, 1989; Fajans et */.,1994). Linkage studies have shown 
that the gene responsible for MODY in this family, MODY1. is tightly linked to markers in chromosome 
band 20q12-q13.1 with a multipoint lod score > 14 in those branches of the family in which MODY is 
segregating (Bell, et al. 1991; Bowden. et */..1992; Irwin, et a/., 1994). The analysis of key 
recombinants in the R-W pedigree localized MODY! to a 13-cM interval ( ~ 7 Mb) between D20S169 and 
D20S176. an interval which also includes the gene encoding HNF-4 (Stoffel, M. et al., 1996). The 
demonstration in the previous examples that mutations in the HNF-1ct gene are the cause of the M0DY3 
form of NIDDM prompted the inventors to screen the HNF-4a gene for mutations in the R-W pedigree. 
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The human HNF-4a gene consists of 11 exons with the introns being located in the same 
positions as in the mouse gene <Tavaviras, eta/., 1994). Alternative splicing generates a family of HNF- 
4a mRNAs, HNF4 1, 2 and 4, the latter two of which contain inserts of 30 and 90 nucleotides, 
respectively (Tavaviras*r4/.,1994; Laine eta/., 1994; Drewes. 1996). Of these, HNF4 2 mRNA appears 
to be the most abundant transcript in many tissues. In contrast to a previous report (Drewes et a/., 
1996), the inventors studies show that HNF4a mRNA encodes a truncated and presumably 
nonfunctional form of HNF4cc. The sequence of exon IB, the exon encoding the insertion in HNF-4« 
mRNA revealed an additional T between nucleotides 219 and 220 in both alleles of five unrelated 
individuals (10 chromosomes) not present in the cDNA sequence (Drewes eta/., 1996) which causes a 
f rameshif t and the generation of a protein of 98 amino acids whose function, if any. is unknown. The 1 1 
exons of the HNF-4a gene of two affected, V-20 and 22, and one unaffected, VI-9, subject from the R W 
pedigree were amplified and the PCR products sequenced directly. The sequences were identical to one 
another and to the cDNA (Drewes et at., 1996; Laine et at.. 1994)) except for a C^T substitutions in 
exon 4, codon 130 and exon 7, codon 268. The C-»T substitution in codon 130 results in a Thr 
(ACTHHIe (ATT) substitution and is a polymorphism (T/1130) with a frequency of the lie allele in a group 
of 55 unrelated nondiabetic non-Hispanic white subjects of 5%. The C->T substitution in codon 268 
results in a nonsense mutation CAG (Gin)— >TAG (AM) (Q268X). The nonsense mutation was confirmed 
by cloning and sequencing PCR products derived from both alleles. The Q268X mutation created a site for 
the enzyme Bfa I with digestion of the normal allele generating fragments of 281 and 34 bp, and the 
mutant allele, 152, 129 and 34 bp and facilitating testing for this mutation in other members of the R-W 
pedigree. In the R-W pedigree, Ile130 and the amber mutation at codon 268 were present in the same 
allele. 

The 0268X mutation cosegregated with the at-risk haplotype and NIDDM in the R-W pedigree 
and was not observed on screening 108 healthy nondiabetic non-Hispanic white subjects (216 normal 
chromosomes). Seven subjects in the R-W pedigree who have inherited the mutant allele (V-18, 37 and 
48; and VI-6, 1 1, 15 and 20) have normal glucose tolerance. The ages of five of these subjects (V48. 
and VI-6, 11, 15 and 20) are less than 25 years and thus, they are still within the age range when 
diabetes usually develops in at-risk individuals in this family. Of the others, subject V-18 is 44 years of 
age and has shown normal glucose on all oral glucose tolerance tests, and subject V-37 who is 36 years 
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of age had one glucose tolerance test characteristic of impaired glucose tolerance and one of diabetes at 
ages 1617 years but for the past 19 years each glucose tolerance test has been normal even though she 
has a low insulin response to orally administered glucose. She is very lean and active, and has increased 
sensitivity to insulin during the frequently sampled intravenous glucose tolerance test. During a prolonged 
5 low dose glucose infusion, she became markedly hyperglycemic (Herman, et al. 1 994; Byrne, et al. 1 995). 
Two subjects (V-1 and 4) who have the mutation were considered nondiabetic based on medical history 
and their affection status needs to be evaluated by oral glucose tolerance testing. The results indicate 
that the nonsense mutation in the HNF-4 gene in the R W pedigree is highly but not completely penetrant 
although the age of diabetes onset is variable. 
1 o In addition to subjects who inherited the Q268X mutation but are presently nondiabetic, there are 

subjects in the R-W pedigree who have NIDDM but did not inherit the Q268X mutation or at-risk 
haplotype. Subject IV-9 was diagnosed with NIDDM at 48 years of age and was hyperinsulinemic, a 
diagnosis consistent with late-onset NIDDM rather than MODY. The inventors also tested her six 
children, one of whom had NIDDM and another impaired glucose tolerance, and all had two normal alleles. 
15 Similarly, 10 children of subject III-7, five of whom had NIDDM were also tested, and none had inherited 
the 0268X mutation, suggesting that the NIDDM in this branch of the R-W family is of a different 
etiology. Finally, the five nondiabetic children of 111-11 were also tested and all were normal. The 
presence of both MODY and late-onset NIDDM in the R-W family has been noted previously (Bell, et al. 
1991; Bowden, et al, 1992). The MODY phenotype results from a mutation in the HNF-4 gene. The 
20 cause(s) of the late-onset NIDDM is unknown. 

HNF-4 is a member of the steroid/thyroid hormone receptor superfamily and is expressed at 
highest levels in liver, kidney and intestine (Xanthopoulos et al.. 1991; Sladek et al., 1990). It is also 
expressed in pancreatic islets and insulinoma cells (Miquerol, et al 1994). In liver, HNF-4oc is a key 
regulator of hepatic gene expression and is a major activator of HNMcc which in turn activates 
25 expression of a large number of liver-specific genes including those involved in glucose, cholesterol and 
fatty acid metabolism (Sladek et al., 1990; Kuo et al., 1992). Its expression in kidney, intestine and 
pancreatic islets implies that it plays a central role in tissue-specific regulation of gene expression in 
these tissues as well, although its specific function in nonhepatic tissues has not been addressed. 
Homozygous loss of functional HNF-4a protein causes embryonic lethality characterized by defects in 
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gastrulation underscoring the key role played by this transcription factor in development and 
differentiation (Chen era/., 1994). The phenotype of the heterozygous animals was not described and 
further studies are necessary to determine if they represent a mouse model of MODY. 

HNF-4<x defines a subclass of nuclear receptors which reside primarily in the nucleus and bind to 
their recognition site and regulate transcription as homodimers (Sladek et a/., 1994; Kuo et at., 1992). 
The key role played by HNF-4ct in the regulation of hepatic gene expression is well established (Sladek et 
at., 1994; Kuo et at., 1992). However, its role as well as that of HNF-lot, the MODY3 product and a 
downstream target of HNF-4« action, in regulating gene expression in the insulin-secreting pancreatic B- 
cell is largely unknown, although Emens et 4(1992} have shown that HNF-lot is a weak transactivator 
of the insulin gene. Thus, the mechanism by which mutations in HNF-4ot result in an autosomal dominant 
form of NIDDM characterized by pancreatic -cell dysfunction is unclear. The nonsense mutation in HNF- 
4a found in the R-W family is predicted to result in the synthesis of a protein of 267 amino acids with an 
intact DMA binding domain. However, it is missing the regions involved in dimerization and transcriptional 
activation in other members of the steroid/thyroid hormone superfamily Zhang, et at., 1994; Bourguet, et 
at., 1995; Renaud, etat. 1995; Wagner, R.L. etal. 1995) and as a consequence is predicted to be unable 
to dimerize. bind to its recognition site and activate transcription. Thus, the dominant inheritance is due 
to a reduction in the amount of HNF-4a per se rather than a dominant negative mechanism. The 
decreased levels of functional HNF-4a appear to have a critical effect on p-cell function perhaps as a 
consequence of decreased HNF-lct gene expression, mutations in this gene also leading to MODY as 
described in the examples above. Prediabetic subjects with mutations in either the HNF-4a or HNF-la 
genes exhibit similar abnormalities in glucose-stimulated insulin secretion with normal insulin secretion 
rates at lower glucose concentrations but lower than normal rates as the glucose concentration increases 
(Byrne etat., 1995), a result consistent with HNF-4« and HNF-la affecting a common pathway in the 
pancreatic p-cell. The absence of overt hepatic, renal or gastrointestinal dysfunction in affected 
members of the R-W pedigree suggests that the levels of HNF-4a in these tissues, although possibly 
lower than normal, are sufficient to ensure normal function or that alternative pathways are sufficient for 
expression of key genes. However, detailed studies of hepatic glucose production and metabolism have 
not performed in subjects from the R-W pedigree and it is possible that subtle alterations in these 
processes may be present. 
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The demonstration that MODY can result from mutations in the HNF-la and HNF-4a genes 
suggests that this form of NIDDM is primarily a disorder of abnormal gene expression. In this regard, 
genes encoding other proteins in the HNF-1a/HNF-4a regulatory cascade such as other members of the 
HNF-1 (Mendel eta/., 1994) and HNF-4 families (Drewes eta/., 1996) as well as HNF-3 {Lai eta/.. 1993). 

5 HNF-6 (Lemaigre. et a/. 1 996).), and perhaps dimerization cof actor of HNF-1 (Mendel et a/., 1 991 ) should 
be considered as candidates for other forms of MODY andlor late-onset NIDDM. The role of HNF-4a in 
the development of the more common late-onset NIDDM is unknown. There is no evidence for linkage of 
markers flanking the HNF-4a gene with late-onset NIDDM in Mexican Americans or Japanese implying 
that mutations in the HNF-4a gene are unlikely to a significant genetic factor contributing to the 

10 development of late-onset NIDDM. However, acquired defects in HNF-4a expression may contribute, at 
least in part, to the p-cell dysfunction which characterizes late-onset NIDDM (Polonsky et a/., 1996) 
especially if it plays a central role in regulating gene expression in the pancreatic (i-cell as suggested by 
its association with MODY. Furthermore, the similarity between HNF-4a and ligand dependent 
transcription factors raises the possibility that HNF-4a and the genes it regulates respond to an 

15 unidentified ligand. The identification of such a ligand by the methods of the present invention will lead 
to new approaches for treating diabetes. 

EXAMPLE 4 

Organization and Partial Sequence of the HNF 4<x/M0DY1 Gene and Identification of 
Missense Mutation, R1 27 W, in a Japanese Family with MODY 

20 HNF-4ct is a member of the nuclear receptor superf amily, a class of ligand-activated transcription 

factors. A nonsense mutation in the gene encoding this transcription factor has been recently found in a 
white family with one form of maturity-onset diabetes of the young, M0DY1. In the present example, the 
inventors report the exon-intron organization and partial sequence of the human HNF-4a gene. In 
addition, the inventors have screened the twelve exons, flanking introns and minimal promoter region for 

25 mutations in a group of 57 unrelated Japanese subjects with early-onset NIDDM/MODY of unknown 
cause. Eight nucleotide substitutions were noted, of which one resulted in the mutation of a conserved 
arginine residue, Arg127 (CGG)-VTrp (TGG) (designated R127W), located in the T-box, a region of the 
protein that may play a role in HNF 4a dimerization and DNA binding. This mutation was not found in 
214 unrelated nondiabetic subjects (53 Japanese, 53 Chinese, 51 white and 57 African-American). The 
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R127W mutation was only present in three of five diabetic members in this family indicating that it is not 
the only cause of diabetes in this family. The remaining seven nucleotide substitutions were located in 
the proximal promoter region and introns. They are not predicted to affect the transcription of the gene 
or mRNA processing and represent polymorphisms and rare variants. The results suggest that mutations 
in the HNF-4ct gene may cause early-onset NIDDM/MODY in Japanese but they are less common than 
mutations in the HNF-la /M0DY3 gene. The information on the sequence of the HNF-4a gene and its 
promoter region will facilitate the search for mutations in other populations and studies of the role of this 
gene in determining normal pancreatic (3-cell function. 

1. Methods 

Isolation and partial sequence of the human HNF-4a gene 

Three PI -derived artificial chromosome (PAC) clones, 1 14E13, 130B8 and 207N8, containing the 
human HNF-4a gene were isolated by screening PAC DNA pools (Genome System, St. Louis, MO) by 
PCR™ with HNF-4a specific primers {Yamagata eta/., 1996a). The partial sequence of the HNF-4a gene 
was determined using DNA from PAC's 114E13 and 207N8 and sequence-specific primers with an 
AmpliTaq FS Dye Terminator Cycle Sequencing Kit and ABI Prism™ 377 DNA sequencer |ABI, Foster 
City, CA). The promoter sequence was examined for transcription factor binding sites using Matlnspector 
(Quandt et al., 1995) and TFSEARCH (Version 1.3 http«www.genome.ad.gp/kit/tfsearch.html). The 
sequences of alternatively-spliced mRNAs were confirmed by sequencing PCR™ products generated by 
amplification of human liver cDNA using specific primers. 
Screening of the HNF-4<x gene for mutations 

The 12 exons, flanking introns and minimal promoter region were screened for mutations by 
amplifying and directly sequencing both strands of the PCR™ product using specific primers (the 
sequences of the primers are available at www.diabetes.org/diabetes). The sequence of the missense 
mutation (R127W) was confirmed by cloning the PCR™ product into pGEM-T (Promega, Madison, Wl) and 
sequencing clones representing both alleles. The R127W mutation leads to loss of a Msp I site and 
subjects were tested for the presence of this mutation by digestion of the PCR™ product of exon 4 with 
Msp I, separation of the fragments by electrophoresis on a 3% NuSieve* 3:1 agarose gel (FMC 
BioProducts, Rockland, ME) and visualization by ethidium bromide staining. The sequences of the DNA 
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polymorphisms are based on sequencing both strands of the PCR™ product and were not confirmed 
directly by cloning and sequencing the PCR™ product. 
Subjects 

The study population consisted of 57 unrelated Japanese subjects attending the Diabetes Clinic, 
Tokyo Women's Medical College who were diagnosed with NIDDM before 25 years of age andjor who 
were members of families in which NIDDM was present in three or more generations: age at diagnosis. 
20.1 ±7.5 years ImeantSE); male/female. 31/26; and treatment, insulin - 36. oral hypoglycemic agents - 
10, and diet - 1 1. Thirty-two of the subjects met strict criteria for a diagnosis of MODY {i.e., NIDDM in 
at least three generations with autosomal dominant transmission and diagnosis before 25 years of age in 
at least one affected subject). NIDDM was diagnosed using the criteria of the World Health Organization 
(Bennett eta/., 1994). At the time of recruitment, informed consent was obtained from each subject and 
a blood sample was taken for DNA isolation. Fifty-three unrelated nondiabetic Japanese subjects were 
tested for each nucleotide substitution and mutation to determine if the sequence change was a 
polymorphism or disease-associated mutation. In addition, 53 Chinese (15), 51 white (16), and 57 
African-American unrelated nondiabetic subjects (16) were tested for the R1 27W mutation 

2. Results 

Organization and partial sequence of human HNF-4a gene. The human HNF4ct gene (gene 
symbol, TCF141 consists of 12 exons spanning approximately 30 kb, of which about 10 kb were 
sequenced including 1 kb of the promoter region (the gene sequence is available at 
www.diabetes.org/diabetes). Human HNF-4a mRNA is alternatively spliced (Hata et ai, 1992; Chartier 
et aL 1 994; Drewes et ai, 1 996; Kritis et ai, 1 996) which may generate as many as six different forms 
of HNF-4oc (FIG. 12). HNF-4a2 is the predominant form present in many adult tissues including liver, 
kidney and intestine. The inventors have used RT-PCR™ to determine which HNF-4ct transcripts are 
expressed in human pancreatic islets. This analysis showed that islets express mRNAs for HNF4a1, 2 
and 3. The inventors could not detect islet transcripts that included exons 1C and IB although 
transcripts containing these two exons could be detected in human liver by RT-PCR™. 

The sequence of 1 kb of the promoter region of the human HNF-4a gene was determined (FIG. 
13). The comparison of the sequences of the human and mouse genes showed regions of sequence 
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conservation that included the predicted start of transcription and the binding sites for several 
transcription factors including HNF-6, AP I, HNF-3, HNF-1a end NF-1. The transcription start site for 
the human gene has not been determined directly but has been inferred from studies of the mouse gene 
which showed multiple start sites spread over a 10 bp interval (Zhong et at. 1994; Tavaviras et a/., 
1994) of which one was defined as nucleotide + 1 (Zhong et a/.. 1994). The sequence homology in the 
promoter of the human and mouse genes suggests that transcription of the HNF-4a gene may be 
regulated in a similar manner. In this regard. Zhong eta/. {Zhong eta/., 1994) have shown that the major 
promoter activity in a hepatoma cell line was associated with a 126 bp fragment of the mouse promoter 
(nucleotides 289-414 in FIG. 13). There is 83% identity between the human and mouse sequences in this 
minimal promoter region. 

Mutations and polymorphisms in the HNF-4ct gene. The twelve exons, flanking introns and 
minimal promoter region were screened for mutations in 57 unrelated Japanese subjects with earlyonset 
NIDDM/MODY. This analysis revealed one putative mutation (FIG. 14) and seven DNA 
polymorphisms/variants (Table 11). The putative mutation in exon 4 at codon 127, CGG (Arg)— »TGG 
(Trp) (R127W) alters a conserved amino acid that is located in the T-box. a region implicated in receptor 
dimerization and DNA binding (Lee et at.. 1993; Rastinejad et a/., 1995; Gronemeyer and Moras, 1995; 
Jiang and Sladek et a/., 1997). The C->T substitution in codon 127 results in the loss of a site for the 
enzyme Msp I and digestion of the normal allele generates fragments of 104, 91. and 78 bp, whereas the 
mutant allele generates fragments of 104 and 167 bp. PCR--RFLP analysis showed that the R127W 
mutation was not present in any of 214 unrelated nondiabetic subjects of different ethnic groups (53 
Japanese, 53 Chinese, 51 white and 57 African-American). 
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TABLE 11 



DNA Polymorphisms/Variant s in the Human HWF-4a Gene in Japanese Subjects 
Nucleotide Substitution Allele frequency. 



Location 



Early-onset 



Nondiabetic 



Promoter 


nt 922 


G-+A 


G-0.99, A-0.01 


G-1.00, A-0.00 


Intron 1A 


nt 13641 + 109) 


T->C 


T-0.99, A-0.01 


T-1.00.C-0.00 




nt1486 (-21) 


G->A 


G-0.99, A 0.01 


G-0.99, A-0.01 


IntronlC 


nt 2218(105) 


G-+A 


G-0.99, A 0.01 


G-1.00. A-0.00 


Intron IB 


nt 24201+8) 


A->G 


G-0.99, A 0.01 


G-0.99, A-0.01 




nt3142 (-38) 


T-»C 


T-0.28, C-0.72 


T-0.24, C-0.76 




nt3175 (-5) 


C-»T 


C-0.84, T-0.16 


C 0.86, T-0.14 



The R127W mutation was present in three of five diabetic members of the J2-21 family, a MODY 
family characterized by severe microvascular complications (Iwasaki et ai. 1988) (FIG. 15). In addition, 
subject II-2 must be a carrier since she has children with both normal homozygous and heterozygous 
genotypes. The age at diagnosis of diabetes in two of the four subjects with the R127W mutation was 
<25 years (subject II-2, 16 years; and subject 111-4. 17 years). One of the subjects with the R127W 
mutation was diagnosed with diabetes at 90 years of age indicating the variable penetrance of the 
mutant allele. Another subject, the 12 year-old son of subject .11-4, has inherited the mutant allele but ,s 
nondiabetic. However, he is not yet beyond the age at risk and may develop diabetes in the future. There 
are two subjects with diabetes in the J2-21 family who did not inherit the at-risk allele (subjects III-3 and 
-6). Such etiological heterogeneity has been noted previously (Bell et at.. 1991). 

The seven DNA polymorphisms/variants were located in the promoter region and the introns 
(Table 11, FIG. 13). In subject J2-96 (FIG. 15). there was a G^A substitution at nucleotide 922 in the 
proximal promoter region which changes the human sequence so that it more closely resembles the 
sequence of the mouse gene (FIG. 13). This substitution was not found on screening 53 nondiabetic 
subjects. Since this substitution does not alter a conserved residue or disrupt the binding site for one of 
the factors predicted to regulate transcription of the HNF-4a gene, the inventors believe that it is a rare 
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variant ra,ta, than . ^he^ ^ociated nwterion. However, further studies „e necessety ,. disringuieh 
between these two possibilities. 

The six substitutians found j„ mttms „ m ,„ ^ „, djsiupt ^ ^ ^ ^ 

dmucteutides .1 ,he opto deno, end acceptor si.es. respect, and are thus unlikely .o effect spiicin, 
5 The substitute a, nucleotides ,486. 2420. 3,42 end 3,75 we,e found in bo* diabetic and 
nondrahatie Japanese subjects indicating the, they are polymorphisms rather than diabates-essociatad 
mu.au.ns. The substitutions ,, nuclides ,364 am, 22,8 were found only in tw> differen , lmr8l „ ad 
sub^ts with eerlyense, NIDDM/MOOY. The inventors belkrve tha, tea am rare variants rather then 
manetes-associated mutations as they „e no, nea, the sphce donor and acceptor sites b„, am rathe, in 
» the central portion of the intren. 

EXAMPLES 

Hepatic Function in a Family with a Nonsense Mutation <R1 54X1 in 
HNF 4afM0DY, Gene 

MOOY is a genetically heterogeneous manopenic disorder cheractented by autosomal dominant 
rnheritence. onse, usually before 25 yarns of ape and abnomml penceeric Nell function. Mutations i„ 
the hepetocyte nuctar fecto, (HNFWa/MOOY,. ,lucokinase/MODY2 end HNF-,afM0DY3 penes con 
ceuse this form of diabetes. In centred ,n tha ghrcrrkinase end HhlF-lcc genes, mutations in the HNF-4a 
" ' n " a,iW " » nram ™" — - "W - the inventors' undemandinp of the MODY, fern, of 
mrte.es ,s besed on studies of only e sinple family, the M pedigree. Here the inventors reporl the 
"tontrfrcehon of annthe, family with MODY, and the fer in which there has been a detailed 
cherecteritarian of hepatic function. The effected members of this family. Omsdm,.,, hove inherited e 
nonsense mutetion, R,54X in the HNF-4a pene end am predicted to h„e reduced levels o, rids 
trenscnprion focto, in the tissues in which i, is .-pressed including pancreatic islets, live,, kidney and 
mtestme. Subjects with the R 154X nutation exhibited e dimimshed insulin sectetory response ,„ oral 
Pfucoea. HNF4ct plays a centra role in tfeaue-epecifrc repuletien a, gene expression in the live, in^ 
the con„dl ., synthesis pratama invohred m choloeteml end lipoprotein metabolism and the capulation 
cescada. H.weva,. subjects with the R,54X mutetion shewed no abnormalities in lipid metabolism ., 
coapulatron except fo, a paradoxical 3.3 fold increase in senrm lipop,o,ein,el levels. No, wes them any 
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evidence of renal dysfunction in these subjects. The results suggest that M0DY1 is primarily a disorder 
of P-cell function. 

1. Methods 

Subjects. 

The study population consisted of members of twelve unrelated families with early-onset NIDDM 
ascertained through the Department of Internal Medicine III. University Clinic Carl Gustav Carus of the 
Technical University, Dresden, Germany. Families were selected based on the presence of noninsulin- 
dependent (type 2) diabetes mellitus (NIDDM) in two or more generations with diagnosis before 35 years 
of age in at least one subject. Sufficient family data were available to suggest a diagnosis of MODY in 
nine of these families \i.e., NIDDM in three generations with autosomal dominant inheritance and onset 
before 25 years of age in at least one affected subject) (Fajans et ah, 1994). The remaining three 
families were classified as having early-onset NIDDM. The average age at diagnosis of diabetes in 
affected members of these twelve families was 29.9±2.8 years (range. 14-60 years) (mean±SEM) and 
included 18 men and 13 women of whom 12. 12 and 7 were being treated with insulin, oral hypoglycemic 
agents and diet, respectively. At the time of recruitment, informed consent was obtained from each 
subject and blood and urine samples were obtained for DNA isolation and clinical testing. 
Screening HNF-4o. gene for mutations. 

The minimal promoter region (nucleotides -21 to -459) (Zhong et at.. 1994) and 10 axons 
encoding the HNF-4a form (Drewes et at. 1996) of HNF 4a were screened for mutations by polymerase 
chain reaction (PCR~) amplification and direct sequencing of both strands of the amplified PCR™ product 
as described previously (Yamagata eta/., 1996). Sequence changes were confirmed by cloning the PCFT 
product into pGEM-4Z (Promega, Madison. Wl) and sequencing clones derived from both alleles. The 
sequences of the primers for the amplification and sequencing of the minimal promoter region are P 
1,5'-CAAGGATCCAGAAGATTGGC-3' (SEQ ID N0:120). and P2, 5'-CGTCCTCTGGGAAGATCTGC-3' (SEQ 
ID N0:121); the size of the PCR™ product is 479 bp. The sequence of the promoter of the human 
HNF-4a gene has been deposited in the GenBank database with accession number U72959. 
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Linkage analysis. 

Family members were typed with the markers D20S43, D20S89, D20S96. D20S119, D20S169 
and D20S424, all of which are tightly linked to the HNF-4a gene (Stoffel eta/.. 1996). Tests for linkage 
were carried out using the haplotype formed from these markers and assuming a recombination frequency 
between adjacent markers of 0.001 with the computer program ILINK (Lathrop et a/., 1984; Lathrop and 
Lalouel, 1984). The frequencies of the haplotypes were estimated from the data. The analysis assumed 
a disease allele frequency of 0.001 and two liability classes. Liability class 1 included individuals who 
were 25 years of age with penetrances of 0.00, 0.95 and 0.95 for the normal homozygote, heterozygote 
and susceptible homozygote, respectively. Liability class 2 included individuals who were <25 years of 
age with penetrances of 0.00, 0.60 and 0.95 for the normal homozygote. heterozygote and susceptible 
homozygote, respectively. The affection status of the one subject with impaired glucose tolerance was 
coded as affected. The maximum expected lod score (ELOD) was determined using the computer program 
SLINK (Ott, 1 989; Weeks et al., 1 990). 
Clinical Studies. 

A standard 75 g oral glucose tolerance test was given to subjects after a 12 h overnight fast. 
Treatment with insulin and oral hypoglycemic agents was discontinued 12 h and 24 h, respectively, 
before testing. Blood samples for glucose, insulin, C-peptide and proinsulin were drawn at 0. 30, 60, 90 
and 120 min. Fasting blood samples were also drawn for the measurement of insulin, islet cell and 
glutamic acid decarboxylase (GAD) antibodies, glycosylated hemoglobin (HbA te ). lipoprotein^ ) 
apolipoproteins Al, All. B, CM, CHI and E, cholesterol (total and in VLDL, LDL, HDL, HDL2 and HDL3) 
triglycerides (total and in VLDL and LDL + HDL), coagulation time (QUICK test) and partial thromboplastin 
time (PTT). fibrinogen, von Willebrand factor antigen (vWFnAg). plasminogen activator inhibitor-1 (PAM), 
tissue-type plasminogen activator (tPA), alanine aminotransferase, y-glutamyl transferase, bilirubin, 
albumin, total protein, hemoglobin, creatinine, urea, amylase, lipase and uric acid. A urine sample (from a 
24-hour collection of urine) was taken for measurements of creatinine and microalbumin. 
Assays. 

Blood glucose was measured with a hexokinase method (Boehringer-Mannheim, Mannheim. 
Germany), plasma insulin and C-peptide by radioimmunoassay (DPC Biermann GmbH, Bad Nauheim, 
Germany; and C peptide RIA Diagnostic Systems Laboratories, Sinsheim, Germany, respectively), plasma 
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proinsulin by ELISA (DRG Instruments, Marburg, Germany), HbA lc by HPLC IDIAMAT Analyzer, Bio-Rad. 
Munich, Germany), fibrinogen by the Clauss method (Fibrinogen A Kit. Boehringer-Mannheim), PAI-1 by 
bioimmunoassay and ELISA (TC* Actibind PAI-1 and TC* PAI-1 ELISA. Technoclone/lmmuno GmbH 
Deutschland, Heidelberg. Germany). tPA by ELISA (TintElize* tPA, Biopool AB. Umea, Sweden), vWFr:Ag 
enzymatically (ELISA Asserachrom* vWF, Boehringer-Mannheim), insulin- and GAD-Ab by ELISA and 
radioimmunoassay (Elias, Freiburg, Germany), islet cell-Ab by an immunofluorescence assay (using a 
positive sample from EUROIMMUN Immunologie GmbH, GroB Gronau, Germany), coagulation and partial 
thromboplastin time by the AMAX Analyzer (Munich. Germany). Total cholesterol, cholesterol in VLDL, 
HDL. LDL+HDL, and HDL3 were measured by the CHOD-PAP, total triglycerides and triglycerides in VLDL 
and LDL+HDL by the GPO PAP method using the Ciba Corning 550 Express Clinical Chemistry Analyzer 
(Boehringer-Mannheim). HDL2-cholesterol was calculated using the formula HDL2-HDL-HDL3. Samples 
for the measurement of cholesterol, triglycerides in VLDL, HDL, LDL+HDL were prepared by preparative 
ultracentrifugation using a Beckman Optima tabletop TLX ultracentrifuge with a TLA-120.2 rotor. Serum 
creatinine, urea, uric acid, total protein, alanine aminotransferase, y-glutamyl transferase, bilirubin, 
amylase and urine creatinine were measured using the BM Hitachi 717 Chemistry Analyzer (Boehringer 
Mannheim). Lipase was measured using the Monarch System (Sigma Germany. Munich, Germany). 
Apolipoproteins At, All and B and urine microalbumin were measured using the Behring-Nephelometer BN 
II (Behringwerke. Marburg, Germany). Apolipoproteins CHI and E were measured using the Sebia System 
(Fulda, Germany), apolipoprotein Cll using the RID System (WAK, Bad Homburg, Germany). 

2. Results 

Identification of a nonsense mutation in the HNF-4a. gene. 

Twelve families with early-onset NIDDM/MODY were ascertained for genetic studies of MODY in 
subjects of German ancestry. Mutations in the HNF-la|M0DY3 gene (Yamagata et al, 1996) were found 
in three of these families (Kaisaki et aL 1997). The HNF4ct gene was screened for mutations in one 
affected subject from the remaining nine families. There was a C->T substitution in codon 154 of exon 4 
in the proband (II-4) of family Dresden-11 (FIG. 16) which generated a nonsense mutation CGA (Arg)-» 
TGA(OP) (R154X. FIG. 17). The R154X mutation would result in the synthesis of a truncated protein of 
153 amino acids with an intact DMA binding domain but lacking the ligand binding and transactivation 
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domain (Sladek et a,., 1990). In addition to this mutation, there was a silent C-+T substitution in the 
codon for Ala58 (GCC/GCT) in one subject which did not cosegregate with MODY/earlyonset NIDDM. 

The presence of the R154X mutation in other members of the Dresden-1 1 family was determined 
by PCR™ amplification and direct sequencing of exon 4. The R154X mutation cosegregated with MODY 
in the Dresden-11 family (FIG. 16). All diabetic subjects had the R154X mutation as did a 14-year old 
male (111-2) with impaired glucose tolerance. The at-risk haplotype showed some evidence for linkage 
with MODY with a lod score of 1.20 at a recombination of 0.00 (the maximum expected lod score in this 
pedigree is 1.20). 

Age at diagnosis. 

Three subjects were diagnosed with NIDDM between 15-25 years of age and two others at 28 
and 44 years (FIG. 16). The subject, 1-1, diagnosed with diabetes at 44 years of age had proliferative 
retinopathy at the time of diagnosis suggesting that the onset of diabetes had been many years earlier. 

Clinical severity of diabetes. 

The diabetes in the Dresden-1 1 family was severe and all the diabetic subjects were treated with 
either insulin or oral hypoglycemic agents. Subjects with diabetes of long duration {e.g., 1-1, ll- 4 ) had 
diabetic complications including proliferative retinopathy, macrovascular disease (coronary heart disease) 
and peripheral polyneuropathy. Surprisingly, none of the subjects with the R154X mutation had evidence 
of nephropathy. Thus, the diabetic phenotype of the Dresden-1 1 family is very similar to that seen in the 
R-W pedigree (Fajans etal.. 1994). None of the subjects in the Dresden-1 1 family were positive for islet, 
insulin or GAD antibodies. 

Insulin-secretory response. 

Previous studies have shown that prediabetic subjects with a mutation in HNF-4ct exhibit a 
characteristic defect in the normal pattern of glucose-stimulated insulin secretion as well as abnormalities 
in other measures of normal (3-cell function (Herman etal., 1994; Byrne*,*/.. 1995). The OGTT studies 
showed a profound reduction in insulin secretion accompanied by diminished C-peptide and proinsulin 
levels in subjects with the R154X mutation (FIG. 18). 
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None of the subjects with the R154X mutation showed evidence of secondary 
hypertriglyceridemia, even though several (M, 11-4, HI D had poor metabolic control with HbA te levels of 
10.6, 8.8 and 10.1, respectively (Table 12). 

TABLE 12 

Clinical Parameters of the Dresden- 1 1 family 



Parameter 



Genotype 



Normal/Mutant 



Normal/Normal 
(female/male) 



Reference 
values 



Age at diagnosis (years) 


r\ r» A ft i At 

26.40 ± 3.4/ 


Current age (years) 


35.50 ± 7.58 


n (females/males) 


nil 

2/4 


BMI (kgfm 2 ) 


25.21 ± 1.15 


HbA te {%) 


8.13 ±0.78 


Basal insulin (nM) 


0.067 ± 0.005 


Basal C-peptide (nM) 


0.60 ± 0.08 


Cholesterol (mM), total 


4.72 ±0.41 


in VLDL (mM) 


0.79 ±0.31 


in LDL (mM) 


2.86 ±0.25 


in HDL (mM) 


1.17±0.18 


in HDL2 (mM) 


0.31 ±0.06 


in HDL3 (mM) 


0.86 ±0.12 


Triglycerides (mM), total 


0.70 ±0.13 


in VLDL (mM) 


0.43 ±0.13 


in LDL+HDL(mM) 


0.28 ±0.02 


Lipoprotein (a) (mg/l) 


816.0 ±90.4 


ApoB (9/D 


1.38 ±0.22 


ApoAl (g/l) 


1.66 ±0.16 


ApoAII (g/l) 


0.32 ±0.02 


ApoE (mg/l) 


61.2 ±12.2 


ApoCII (mgll) 


36.0 ± 5.3 


ApoCIII (mg/l) 


26.7 ± 3.7 



62/41 
1/1 

41.08/22.86 
5.60/5.30 

0.080/0.040 
0.68/0.45 
5.03/5.01 
0.21/0.70 
3.62/3.34 
1.32/1.26 
0.44/0.27 
0.88/0.99 
0.65/1.45 
0.34/1.06 
0.33/0.47 
3.0/6.0 
1.33/1.38 
1.89/2.00 
0.290.53 
65.0/55.0 
36.0/61.0 
23.0/36.0 



< 25.00 
<6.50 

0.059-0.253 

< 1.06 
<5.20 

0.10 1.40 
1.80-5.10 
0.80-2.50 
0.100.60 
0.80-1.90 
0.40-2.80 
0.10-2.10 
0.200.80 

< 250.0 
0.72-1.50 
1.12-1.75 
0.30-0.70 
13.0-76.0 

7.0-63.0 
16.0-45.0 
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TABLE 12, continued 



Parameter 



aminotranferase 
transferase 



Creatinine (jiM) 

Urea (mM) 

Total protein (g/l) 

Albumin (g/l) 

Alanine 

(nmol/lsj) 

y-glutamyl 

(jimol/(ls)) 

Bilirubin l^iM), total 

Uric acid (fiM) 
Exocrine pancreatic function 

Amylase (U/l) 

Lipase (|nmole/(ls)) 
Coagulation parameters 
Coagulation time (%) 
Partial thromboplastin time (s) 
Fibrinogen (g/l) 

Von Willebrand Factor Antigen 

(%) 

PAM (ng/ml), total 

tPA (ng/ml) 
Urine analysis 

Creatinine {mM) 

Microalbumin (mg/24 h) 
Values are means±SEM IstanHarri prn 



Genotype 
Normal/Mutant 


Normal/Normal 
(female/male) 


Reference 
values 


91.5 ±5.6 
5.6 ±0.8 
72.7 ± 1.7 

38.6 ±1.0 
0.39 ± 0.06 


73.0/80.0 
6.6/1.0 
77.2/84.0 
38.5/43.5 
0.39/0.91 


< 124.0 
3.6-8.9 
65.0-85.0 
37.0-53.0 
0-.10-0 67 


0.54 ±0.12 


0.55/1.11 


0.180.83 


16.7 ±5.2 
249 ±28 


13.7/24.3 
317/359 


1.0-16.0 
208-41 R 


56.8 ±6.7 
1.22 ±0.40 


30.0/58.0 
0.20/3 00 


17.0-115.0 


117 + 6 
33 ± 1 
3.54 ± 0.23 
103±11 


108/125 
29/35 
2.89/3.69 
145/115 


70-120 
30-40 
1.50-4.00 
70-200 


36±8 
10.6 ±1.5 


102/40 
17.2/16.0 


30-80 
2.0-10.0 


8.36 ± 0.88 
<2.2 


7.96/2.86 
1 3.5/ < 2.2 


4.66-18.00 
2.2-18.0 



values. Reference values are those from the Institute of Clinical Laboratory Diagnostics. University Clinic 
Carl Gustav Cams, Dresden. 

Hepatic and renal function. 

HNF-4a is expressed in the liver and kidney and as such mutations in HNF-4ct might be expected 
to affect the normal function of these tissues {Sladek et a/., 1990; Cereghini, 1996). In this regard, 
HNF 4a regulates the expression of a number of apolipoproteins including Al, AIV, B and Clll (Cereghini. 
1996). The serum apolipoprotein levels and lipoprotein fractions were normal in the subjects with the 
R154X mutation except for lipoprotein(a) levels, which were elevated 3.3-fold (Table 12). Lipoprotein(a) 
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levels have been reported to be elevated in subjects with NIDDM in some studies (Nakagawa eta/., 1996; 
Hirata et ai. 1995) but not others (Duriach et a/., 1996; Chico et ai. 1996). However, an elevation in 
lipoprotein^) levels in subjects with HNF-4a deficiency appears paradoxical as expression of 
lipoprotein(a) is controlled by HNMa (Wade et ai, 1994) which is in turn regulated by HNF-4ct 
(Cereghini, 1996). Thus, lower lipoprotein(a) levels not higher would be expected in subjects with the 
R154X mutation. Further studies will be necessary to determine the relationship between lipoprotein^) 

levels and mutations in HNF-4a. 

HNF-4a also regulates the expression of albumin, fibrinogen and the coagulation factors VII. VIII, 
IX and X (Cereghini, 1996; Erdmann and Heim, 1995; Figueiredo and Brownlee, 1995; Naka and 
Brownlee, 1996; Hung and High, 1996). The serum levels of albumin and fibrinogen and measurements of 
coagulation time were normal in subjects with the R154X mutation (Table 12). HNF-4a is also expressed 
in the kidney although the identity of the target genes in this organ are unknown (Sladek et ai, 1990; 
Cereghini. 1996). The urinary creatinine and microalbumin levels were normal in subjects with the R1 54X 
mutation (Table 12) suggesting that renal function was not impaired in subjects with mutations in the 
HNF-4a gene. 

EXAMPLE 6 

Diminished Insulin and Glucagon Secretory Responses to Arginine in Nondiabetic Subject 
with a Mutation in HNF4a|M0DY 1 Gene 

Nondiabetic subjects with the 0268X mutation in the hepatocyte nuclear factor (HNF|- 
4a/M0DY1 gene have impaired glucose-induced insulin secretion. To ascertain the effects of the 
nonglucose secretagogue arginine on insulin and glucagon secretion in these subjects, we studied 18 
members of the RW pedigree: 7 nondiabetic mutation negative (NDI-1), 7 nondiabetic mutation positive 
INDM), and 4 diabetic mutation positive (DM). We gave arginine as a 5 g bolus followed by a 25 
minute infusion at basal glucose concentrations and after glucose infusion to clamp plasma glucose at 
~ 200 mg/dl. The acute insulin response (AIR), the 1 0-60 minute insulin area under the curve (AUC), and 
the insulin secretion rate (ISR) were compared as were acute glucagon response (AGR) and glucagon 
AUC. The NDI+) and DM groups had decreased insulin AUC and ISR and decreased glucose potentiation 
of AIR, insulin AUC, and ISR to arginine administration when compared to the ND[ ] group. At basal 
glucose concentrations, glucagon AUC was greatest for NDI-], intermediate for NDM, and lowest for 
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UM group. During the hyperglycemic damp there was decreased suppression of glucagon AUG for both 
ND M and DM groups compared to the ND[-J group. The decreased ISR to arginine in the ND[ + ] group 
compared to the NDl-J group, magnified by glucose potentiation, indicates that HNF-4a affects the 

signaling pathway for arginine-induced insulin secretion. The decrease in glucagon AUG and decreased 

supp.ess.on of glucagon AUG with hyperglycemia suggest that mutations in HNF4ct may .ead to a-cell 

as well as P-cell secretory defects or to a reduction in pancreatic islet mass. 

1. Methods 

Subjects 

Eighteen manbote the RW pe diBIee fram brailc|lBS „. 2 8nd „. 6 ^ % ^ y ^ 
atudied CFaJans, ,990; Fejane * ,994). The study was reviewed and eppmved by ,he mm 

*" " ** Uni * erSitV 01 Michi » a " and all subjects and,., pare „,s prnvided 

wnttan mfoinad constmt. The glycemic ata.ua af aach subject was determined by o,al gluces. .aletance 
tea. (OGTT) as defined by the National Diabetes Data Greup INDOG, (.9791. Each subject was enginally 
typed with a aeries of DNA markers an chrome 20, to determine whether he e, she has inherited the 
extended at-risk haplatype (defined by alleles at the loci ADA, 020S17. 020S79. and D20S4) associated 
with MOOT, (Bell a, a/.. ,991; Bowden a, *.. ,992. Cox « a/.. ,992; Rothschild „ a/.. ,993, When 
the 0268X mutation in ,he HNF-4a gene was shown to be th. cause „f MODy, in the RW pedigree 
(Varna,.* „„., ,996a,. subjects were ,e S ,.d directly f.r this mutate, All fhn subjacfs included in this 
study, except nnndiabetic indiyidual GM11626, ha»e been tested for the presence „ , he 0268X 
mutate,. Hnweyer. his nnndiabetie father. IV-16. was tested and he dues nnt have the 0268X mutation 
Baaed on the OGTT reaufta end the presence et absence e, the 0268X mutation n, at-riak heptotype. the 
family members were subdivided into three groups: 

Mnndiahntin n?p 8x miitatinnnenathia g regg |«tfl|| 

Se,en nnndiabetic mutation-negathre subjects were atudied. GM identificatien numbers (Human 
Genetic Mutant Cell Repeaitnry, as given by Bed a, a/. (,99,1. RW pedigree generate, and p eISM , 
numbets as given by Fajana et a/. (1994). and age at the time of study were: GM1008S. IV 22 45 feata- 
GM„429. IV-4,. 32 yenrs; GM11626. nffspting nf IV-16. ,7 years; GM,0,53, uHspting of IV , 7 ,8 
years; GM„579. uffspring ., IV-19. ,6 ynars; GK.,,33,. offspring nf 1V .2,. 21 years; and GM„333 
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offspring of IV-21, 22 years. Four of these subjects were offspring of diabetic parents IGM10085. 
GM11429, GM10153. and GM11579). 

Mnndiahetic Q268X mutation-p nsith>B groun (NQf+1) 

This group included seven subjects. Two subjects never had diabetes or impaired glucose 
tolerance on OGTT: GM11090, offspring of IV-143, 16 years; and GM1066B. offspring of IV 141. 16 
years. Five subjects has previous abnormalities of glucose tolerance but none had ever had an abnormal 
fasting plasma glucose or glycosylated hemoglobin concentration. Two had single diabetic OGTTs 4 and 
22 years, respectively, before the study but had numerous normal glucose tolerance tests subsequently: 
GM10018, IV-168, 25 years; and GM8072, IV-143, 39 years. Three subjects had fulfilled NDDG 
diagnostic criteria for diabetes by OGTT in the past. Prior to the study they had normal OGTTs on 2, 4 
and 5 occasions, over 2. 4 and 4 years, respectively. They were: GM11600. offspring of IV-143, 14 
years; GM8759, IV 166, 31 years; and GM8073, offspring of 143, 19 years. 

Diabetic Q268X mutatinn-nositive aroup (DM) 

The four subjects in this group ad consistently diabetic OGTTs for 6 or more years or ad mild 
fasting hyperglycemia «200 mg/dl) when untreated. They were GM8106, III-35, 59 years; GM7974, 
IV-141, 43 years; GM8107, IV-165, 26 years; and GM10724, offspring of IV-142, 17 years. Subject 
GM8106 was treated with tolbutamide between 1958 and 1968 and with chlorpropamide since May. 
1995. When untreated, his highest fasting plasma glucose was 160 mg/dl and his highest total 
glycosylated hemoglobin 9.1% (normal < 6.3%). On 100 mg of chlorpropamide per day, his fasting 
plasma glucose was 91 mg/dl and glycosylated hemoglobin was 5.3%. Chlorpropamide was discontinued 
for 26 days before the study and fasting plasma glucose was 99 mg/dl and total glycosylated hemoglobin 
concentration was 5.8% on the day of the study. Subject GM7974 was treated with diet alone. She had 
diabetic OGTTs intermittently since 1969; OGTTs were consistently diabetic since 1990. Her fasting 
plasma glucose was 84 mg/dl and her total glycosylated hemoglobin was 6.9% at the time of the study. 
Subject GM8107's highest fasting plasma glucose was 192 mg/dl and highest total glycosylated 
hemoglobin was 9.5% when untreated. When treated with glyburide 1,25 mg daily, she had normal 
fasting and postprandial plasma glucose concentrations and a total glycosylated hemoglobin of 6.7%. 
Glyburide was discontinue 1 1 days before the study. Her fasting plasma glucose concentration was 106 
mg/dl and her total glycosylated hemoglobin was 6.9% on the day of the study. Subject GM10725 had 
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been tteated with ^ u mg ^ ^ ^ ^ ^ ^ ^ 
euneentratiun was 9.0%. She d*c.„ti„„ed — fc— 5 days before ,he study and her fastin , p|asma 
glucose was 158 m„dl end he, total glycosylated lmm.gl.bin was 7.7* at the time ef the study. 
Protocol 

Subjects were studied „ the Unhmmity of Michigan Genera, Clinical Reaaatch Center ,CRC, 
Sub^s wer. admitted ,„ , hB CRC m ,ha evening „ d studieo in the rKumbal , ^ ^ a JMJ 
hour ayermgh, faat. Aa mtravamus sampling catheter ^ ^ „ , ^ fc _ ^ 

».n of ,h. hand and the hand was hep, h, a WOMton bax th6rmostBlil;a|ly „„„„,„ „ ^ ^ 
artenahaatien of venous blood. A „„„„ cat(l8ta , for insu|jn , arg81ine ,„„ ^ ^ 
mserted into the conttalatat* antecuiti,., ,„ ^ wjth f , s(( , g , ^ 

...ravenous bolus of human re„da r insu „„ (0 . 007 u/k9 „ „ ^ ^ ^ ^ ^ 

10 lower the plasma glucose to approximately 75 mgfdl. 

Blotm samples for measureman, of has,, glucose. c . paptide , „„ concentralions 
were ohtainad „ -30. -20, .,0. and 0 minute, At 0 minute, arginina was adminiaterod. The tota, 
argmme dose was emulated as 0.41 gmfKg hod, weigh, ,„ a max.™ of 30 gram, A, time 0, 5 grams 
of ..Oiniue wea adrmrnstamd as an IV bahxs over 30 seconds and a, time 5 minute, the remaining 
argrmne was infused with a pump at a capstan! rate over 25 minute, Samples war. drawn at 2 3 5 7 
10. 20. and 30-minutes for mamemem of glucose, mstdin. C-peptid, and gmcagon. Following the first' 
atgrnme bolus and infusion, thme was a 00 minute washout period. Blood samples for measurement of 
lb. same constituents wate obtained a, 40. 50. 00. 70. 60. and 90 minute, A. 90 minute, glucose 
(160 mg/kgl „ as admiaiateted own 30 seconds end a variable rate infusion of 20% dextrose with 10 mE, 
KCIfl was begun to clamp the pbsm, ,| ucose tevel a , 200 mg(d| ,„ fc ^ ^ ^ 

detemuned by ftepuen, bedside Mood ghmus. measures. Blood samptes fot ,ha above constituents 
were .blamed a, 92. 93. 95. 97. ,00. 1,0. ,20. ,30. ,40. and ,50 mmute, A, ,50 minute, at.™. 
(0 4, grn/kg. maximum 30 grams, was again .dministemd as a 5 gram bolus followed aftet 5 minutes by 
an refusion over 25 minutes, as previously, a „d samples worn drawn at ,52, 153. 155 157 , 60 170 
180. ,90. 200. 2,0. 220. 230, and 240 minutes fot measurement o, glucoe, insulin. Cpepride. and 
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All blood samples were collected on ice and stored at 70°C until assayed. Plasma glucose was 
measured on a Kodak Ektachem 700 Analyzer using a hexokinase method (intra-assay coefficient of 
variation [CV] 1.7% at 5.0 mmol and 1.2% at 16.1 mmol). Immunoreactive insulin was measured by 

5 doubie-antibody radioimmunoassay (RIA) (intra-assay CV 6.4%) (Hayashi et at., 1977). C-peptide was 
measured by a specific RIA (intra-assay CV 3.9%) (Faber et at.. 1978). Glucagon was measured by 
doubie-antibody radioimmunoassay (intra-assay CV 3.2%) (Hayashi et at., 1977). All samples were 
measured in duplicate and their means were used. Samples from individual subjects were measured in a 
single assay. All assays were performed in the Michigan Diabetes Research and Training Center 

10 Chemistry Core laboratory. 

Data analysis 

Acute insulin responses (AIR), acute C-peptide responses (ACR), and acute glucagon responses 
(AGR) were calculated as the mean of the 2, 3, 4. and 5 minute hormone levels minus the mean of the - 
10, -5. and 0 minute hormone levels. Glucose, insulin, C-peptide, and glucagon areas under the curve 
,5 were calculated with the trapezoidal rule for the time interval 10 to 60 minute when the arginine bolus 
was administered at time 0 and the arginine infusion began at time 5 minutes. Baseline values, calculated 
as the mean hormone levels measured at -10, -5, and 0 minutes immediately preceding the arginine bolus, 
were subtracted from the areas under the curve. Insulin secretion rates were calculated by deconvolution 
of C-peptide values (Polonsky et at., 1986). All of these indices of insulin secretion were assessed during 
20 arginine administration at baseline glucose levels, during glucose administration, and during arginine 
administration during the hyperglycemic clamp. Slope of potentiation was calculated as the difference 
between the AIR or ACR to arginine obtained during the hyperglycemic clamp and at baseline glucose 
levels divided by the difference between these two glucose levels (Halter et at.. 1979). Results are 
expressed as means ± standard error of the mean. Statistical significance of differences among groups 
25 was assessed with chisquare and unpaired t-tests. The primary comparisons of interest were between 
the ND(-) and NDI+] group. P < 0.05 was defined as the limit of statistical significance. 

2 ' RBSU Eig S hteen members of the RW Pedigree were studied: Seven non-diabetic mutation negative (ND[- 
)), seven non-diabetic mutation positive (NDM). and four diabetic mutation positive (DM) (Table 13). 
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Thero were n. significant differences ^ „ oups „,„ regard ,„ ^ „ ^ ^ D[+j ^ 

tended ,„ be die,. AH su bje C ts „. re eoe-ebese. Festto, gluc „ S e end insnlin | MBb did „., differ 
s^anUy .men, groups although o,., subjecb ,„„„„, „ ^ ^ ^ ^ ^ ^ ^ 

to»eh, M, Cpep,ide laeels ^ tower „, D[+) subjects capered NO| | subjects. fasliog 
(ducgen levels did no, differ among groups. GlycosWate „ djd ^ ^ 

between tbe two nnndiabetic groups, but was higher in the DH group. 

Tablet 3 

Characteristics .f Subject, , rom HW p.^. „ y GlucoM Tolerance ,„d ««.«„„ S „ tus 



Glucose Tolerance 


Alondiabetic 


Alondiabetic 


Diabetic 


Genotype* 


H 


[+] 




Number and gender (M/F) 


572 


% 


i/5 


Age (years) 


24 ±4 


23+4 


36±9 


Body Mass Index (kg/m 2 ) 


25.2 ± 1.5 


23.1 ±1.0 


22.5 ±0.4 


Fasting glucose (mg/dl) 


91±2 


87±2 


V2±]6 


Fasting insulin (fiU/ml) 


10±1 


11 ±2 


7±1 


Fasting C-peptide (ng/ml) 


1.8 + O.T* 


1.6 ±0.2 


1.3 ±0.2 


Fasting glucagon (pg/ml) 


73 + 6 


64 ±9 


77 ±12 


Glycosylated hemoglobin 


5.5 ±0.1** 


5.7 ±0.2** 


7.8 ± 0.4 



*{•] - Normal/Normal 

(+] = Normal/Q268X Mutation 



**p < 0.05 vs. diabetic 1+] 
All values are mean ± SEM 



FIG. 19 demonstrates the protocol and illustrates concentrations of glucose (FIG. 19A), insulin 
(FIG. 19B), C-peptide (FIG. 19C), and glucagon (FIG. 190) during the three phases of the study These 
were: A, administration of arginine (bolus and infusion) at basal glucose concentrations, B, administration 
of glucose (bolus and variable rate infusion, to clamp the glucose level at 200 mg/d., and C, administration 
of arginine (bolus and infusion) during the hyperglycemic clamp. 

Table 14 summarizes average glucose levels; acute insulin responses (AIR) and C-peptide 
responses (ACR, to arginine; and hormone areas under the curve (AUC) and insulin secretion rate (ISR, 
measured 10 to 60 minutes following commencement of the three study phases. These are A) 
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administration of arginine at basal glucose concentrations. B) administration of glucose, and 0 
administration of arginine during the hyperglycemic clamp. 

Table 14- Plasma Concentrations of Glucose. Acute Insulin and C-peptide Responses (AIR and 
ACR) Areas Under the Curve (AUC 10 60 minutes) for Insulin and C-peptide and Insulin 
Secretion Rate (ISR) during administration of A) Arginine at basal glucose concentrations (Bolus 
and Infusion). B) Glucose (Bolus and Infusion) and 0 Arginine (Bolus and Infusion) during 
hyperglycemic clamp. 



Period Group Nondiabetic (I Nondiabetic (+) Diabetic (+) 
Number n = 7 n = 7 n=_4 



A Arginine administration at basal glucose concentration 

Glucose (mg/dir 107 ±3 102 ±2 115 ±15 

AIR(mUM) 48 ±10 70 ± 19 27 ±7 

ACR (ng/ml) 3.05 ± 0.61 3.25 ± 0.44 2.19 ± 0.55 

AUC,(ng/ml) 78.5 ±7.7 25.6 ±5.5 3.5 ±0 J 

AUC C (ng/ml) 205 ±12 71 ±8 38 ±6 

ISRUxg) 76 ±6 31 ±3" 16 ±3< 

B. Glucose administration 

Glucose (mgldir 207 ±2 207 ±5 203 ±7 

AIR^U/ml) 72 ±10 63 ±15 ,^±6 

ACR (nglml) 4.03 ± 0.61 2.83 ± 0.54 1 ,25 ± 0.58 

AUC,(ng/ml) 43.9±6.3 47.1 ±11.4 16 1 ±4 1 

AUC C (ng/ml) 131 ±12 103 ±16 81 ±22 

ISR(U) 63 ±4 51 ±6 33 ±2 1 

C. Arginine administration during hyperglycemic clamp 

Glucose (mg/dir 198 ±2 209 ±7 201 ±6 



AIR(nU/ml) 271 ±33 162 ±36" J 0 *™ 

ACR (ng/ml) 10.33 ±1.31 5.87±0.72« 3.21 I ±0 91' 

AUG, (ng/ml) 628 ±69 149 ±40 25 ±7 

AUC C (ng/ml) 739 ±52 209 ±40 109 ±42 

ISR U) 276 ± 18 101 ±19* 54 ±16 



* mean for period 1 0-60 minutes 
All values are mean ± SEM 

- p £ 0.05 " P ^ 0.01 ' P < 0.001, NDM vs ND[-1 

' p < 0.05 1 P < 0.01 * p < 0.001, Dl+1 vs ND[-1 

s p < 0.05 Dl+lvsNOM 
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£«fe<rte ofArginine and Glucose on Insulin Secretion 
Administration of Aroinine at Basaj m.^se Co nr. B ntr a tinnc 

At baseline, glucose levels did not differ among the groups (Table 13) After the 5 g arginine 
bolus, AIR and ACR did not differ among groups but tended to be lower for the D[ + J group (Table 14, 
Dunne and after the subsequent arginine infusion, glucose levels were slightly higher at 10, 20 and 30 
minute intervals in the MOM as compared to the MD [+J group (FIG. 19, but the average glucose levels 
dunng the 10-60 minute time interval (Table 14, and the glucose area under the curve (1171 ± 99 vs 
1012 ± 141 mg/dl, respectively, p - 0.37, did not differ, Insulin and C-peptide levels rose to a peak at 
30 mmutes in the ND(-) group but were markedly decreased in both the ND[ + J and D( + ] groups (FIG 19, 
The msu.in area under the curve (AUG,, and C-peptide area under the curve ,AUC C , were significantly 
reduced ,n NDM group compare to ND(-] group (Table 14,. They were further reduced in D f+ J group 
compared to the NDM group (Table 14,. ISR was significantly reduced in ND)+) compared to «*] 
subjects and further reduced in DM compared to NDM subjects (Table 14,. 
Administration of Glucose 

Gluc.se levels did no, differ emeng ,1,, , roups dorin B ,he bolus and ,„e verieble re,. glucose 
•ntaon (Tebl, ,4). AIR end ACR ,e gluc.se did n., differ between ,he NOW end N0I-) groups bu, were 
e„n,teen,lv reduced in ,he DM group capered ,. ,he NOt-I group (FIG. 19. Teble ,41. AUC„ AUC C end 
ISR dunng the glucose infusion did net differ between the NDI-J end ND[+J greups (Teble .41. They were 
reduced in the D[*J group compered to the NOI-I group (Teble 141. 

Administration of Arginine during the Hyperglycemic Clamp 

Gluc.ee (..els did no, differ ^ the „o.p S dorin, ,he verieble rele glucose infusion end 
amend erginine bolus end infusion (Teble 14). At hyperglycemic plesme glucose levels, es capered ,e 
euglycemic htnde. AIR end ACR ,o ergmin., end AUC„ AUC C end ISR were enhenced end differ^ees 
emong grnnps were greetly magnified (FIG. 19. Teble 14,. All indices of insulin secretion were 
sig.ificen.ly reduced in the NDM group cumnere ,. , te N0| ] „e„p end .here wee e further reduction h 
the D[+] group (Table 14,. 

FIG. 20A and FIG. 20B demonstrates the slopes of potentiation for insulin and C-peptide 
respect,vely. Glucose potentiation of arginine-stimulated insulin secretion was reduced in both the NDM 
(0.80 ± 0.18, and Of) (0.24 ± 0.04, groups compared to the NDl-J group (2.12 ± 0 25 p < 0 001) 
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The insulin slope of potentiation was also reduced in DM group compared to NDM group (p < 0.05). 
Glucose potentiation of arginine-stimulated C peptide secretion was also reduced in the NDM (0.02 ± 
0.00) and DM (0.01 ± 0.00) groups compared to the NDM group (0.07 ± 0.01, p < 0.01). 
Effects ofArginine on Plasma Glucagon Concentrations 

At baseline, glucagon levels did not differ among groups (Table 13). Acute glucagon responses to 
the 5 g bolus of arginine administered at basal glucose concentrations did not differ significantly among 
NDM, NDM, and DM groups (104 ± 19, 92 ± 16, and 82 ± 23 pg/ml, respectively). On the other hand, 
the glucagon area under the curve (10-60 minutes) during and following the arginine infusion at basal 
glucose concentrations was reduced in DM compared to NDM subjects (4,778 ± 1,087 vs. 7,549 ± 639 
pg/ml, p < 0.05). NDM subjects showed intermediated volumes (5,772 ± 734 pg/ml; p - 0.09 vs. ND[- 
) group). During the hyperglycemic clamp there were no significant differences among glucagon areas 
under the curve for any of the groups (4,237 ± 406, 3.963 ± 508, and 2,941 ± 568 pg/ml, for NDM, 
NDM and DM. respectively). To assess the impact of glucose infusion on the glucagon response to 
arginine in the three study groups, the inventors assessed the differences in glucagon area under the 
curve between the euglycemic and hyperglycemic periods. Decreases in glucagon areas induced by the 
hyperglycemic clamp between the first and the second arginine infusion were 3312 ± 404, 1809 ± 387, 
and 1836 ± 535 pg/ml for the NDM, NDM and DM groups, respectively (p < 0.02 NDM vs. NDM. 

EXAMPLE 7 

MODY Due to Mutations in the HNF-4a Binding Site in the HNF-la Gene Promoter 

Recent studies have shown that mutations in the transcription factor hepatocyte nuclear factor 
(HNFMa are the cause of one form of maturity-onset diabetes of the young, M0DY3. These studies 
have identified mutations in the mRNA and protein coding regions of this gene that result in the synthesis 
of an abnormal mRNA or protein. Here, the inventors report an Italian family in which an A^C 
substitution at nucleotide -58 of the promoter region of the HNF-1a gene cosegregates with MODY. This 
mutation is located in a highly conserved region of the promoter and disrupts the binding site for the 
transcription factor HNF-4ot, mutations in the gene encoding HNF-4a being another cause of MODY 
(M0DY1). This result demonstrates that decreased levels of HNF-la perse can cause MODY. Moreover, 
it indicates that both the promoter and coding regions of the HNMct gene should be screened for 
mutations in subjects thought to have MODY because of mutations in this gene. 
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Subjects 



5 



The MODY family ltaly-1 was ascertained through the diabetes clinic of Santo Spirito's Hospital 
Affection status was determined using criteria of the National Diabetes Data Group. The affection status 
of unaffected family members was defined as normal or impaired based on the results of a standard 75 g 
OGTT. This study had institutional approval and all subjects gave informed consent. 



Family members were genotyped with the markers D12S321, D12S76 and UC-39 all of which 
are tightly linked to the HNF-la gene (M0DY3) lYamagata *,*/., i 996 ,. The forward and reverse primers 
for the polymorphic sequence tagged site (STS) UC39 are 5'-GCAACAGAGCAAGACTCCATCTCA-3' (SEQ 
ID N0.122) and 5'-GAGTTTAATGGAAGAACTAACC-3' (SEQ .D N0:123) respectively, and the PCR 
mcluded initial denaturation at 94°C for 5 min and 35 cycles of denaturation at 94°C for 1 min, 
annealing at 63°C for 1 min and extension at 72°C for 1 min with a final extension at 72°C for 10 min 
The forward primer was labeled with 32 P and the MgCI 2 concentration in the reaction was 1.0 mM. The 
PCR was carried out in a GeneAmp 9600 PCR System (Perkin Elmer. Norwalk, CT). The PCR products 
were separated by electrophoresis on a 5% polyacrylamide sequencing gel and visualized by 
autoradiography. Tests for linkage were carried out using the haplotype formed from D12S321, D12S76 
and UC-39 and assuming a recombination frequency between adjacent markers of 0.001 with the 
computer program MLINK from the LINKAGE package (version 5.1) (Lathrop et al., 1985). The 
frequencies of the haplotypes were estimated from the data. The analysis assumed a disease allele 
frequency of 0.001 and two liability classes. Liability class 1 included individuals whose age was >25 
years of age with penetrances of 0.00, 0.95 and 0.95 for the normal homozygote, heterozygote and 
susceptible homozygote, respectively. Liability class 2 included individuals <25 years of age with 
penetrances of 0.00. 0.50 and 0.95 for the normal homozygote. heterozygote and susceptible 
homozygote, respectively. The affection status of the one subject with impaired glucose tolerance was 
coded as unknown. 

Identification of mutations 

Each exon and minimal promoter region of the HNF-la gene of subjects II 5 and Ml were 
screened for mutations as described previously (Yamagata et at., 1996; Kaisaki et al., 1997). The 
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mutation was confirmed by cloning the PCR product into pGEM-4Z and sequencing clones derived from 
both alleles. The presence of the mutation in other family members and unrelated nondiabetic subjects 
was tested by PCR amplification of the proximal promoter region and direct sequencing. 

2. Results 

5 Linkage studies 

The NIDDM in the pedigree ltaly-1 has the clinical features of MODY including autosomal 
dominant inheritance and age at diagnosis <25 years in multiple family members (Fig. 21). The six 
affected members are treated with either insulin (individuals 11 1. 11-5 and 111-9) or oral hypoglycemic 
agents (11-7, tll-t and 111-2). The three subjects on insulin therapy showed evidence of diabetic 

10 complications including retinopathy (11 1 and 11-5) and nephropathy (111-9). One member of this pedigree. Ill- 
6. has impaired glucose tolerance. 

The polymorphic markers D12S321, D12S76 and UC-39 which are closely linked to the HNF-la 
gene (order: cen - D12S321 - D12S76 • HNF-1 a - UC-39 - qter) were typed in this family. The haplotype 
3-3-7 co-segregated with MODY with no obligate recombinants (Fig. 21). One subject with IGT (age. 18 

15 years) also inherited this haplotype as did two unaffected young women, individuals UI-5 and 111-13, of 21 
and 14 years of age. respectively. These three subjects may be at risk of developing diabetes in the 
future. The LOD score in this family was 1.28 at a recombination fraction of 0.00. Although this LOD 
score does not meet formal criteria for establishing linkage (ie. the LOO score is <3.0). the p-value 
associated with the evidence for linkage is 0.008 which is sufficient to justify a search for mutations in 

20 the HNF-1 a gene. 

Mutation screening. 

Two diabetic subjects. II-5 and IIM. were screened for mutations in the HNF-la gene. No 
mutations were found on screening the mRNAIprotein coding regions, exons 1-10. although the subjects 
were heterozygous for several previously described polymorphisms (Yamagata et al., 1996). Since no 
25 mutations were found in the coding region of the HNF-1 a gene, the proximal promoter region was 
screened. This analysis revealed that both affected subjects were heterozygous for an A-»C substitution 
at nucleotide -58 which is located in a highly conserved region of the promoter of the HNF-1 a gene that 
includes the binding site for HNF-4<x (FIG. 22) (Tian and Schibler et al., 1991; Kuo et al.. 1992). Since 
this mutation does not lead to gain or loss of a site for a restriction endonuclease, it was tested for by 
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PCR amplification and direct sequencing. The A-+C substitution at nucleotide 58 co-segregated with the 
at-nsk haplotype in the ltaly-1 pedigree (FIG. 21, and was not present in a sample of 50 unrelated white 
subjects implying that it is the mutation responsible for MODY in this family. 

EXAMPLE 8 
Mutation in HNF-lp associated with MODY 
HNF-lcc and HNF4a are members of a complex transcriptional regulatory network which 
•ncludes other homeodomain proteins and nuclear receptors as well as members of the forkhead/winged 
hel.x and leucine zipper CCAAT/enhancer binding protein families (Cereghini, 1996). The inventors have 
screened two other members of this network, HNF-1 P (Mendel et al, 1991a; De Simone et al 199,- 
Rey-Campos et al, 1991; Bach and Yaniv, 1993) and the bifunctional protein dimerization cofactor of 
HNF-1 |DCoH)/pterin-4 -carbinolamine dehydratase (PCBD) (Mendel et al, 1991b; Citron et al. 199,, 
for mutations in Japanese subjects with MODY. No diabetes-associated mutations were found in DCoH 
However, the inventors found one subject with a nonsense mutation, R177X, in HNF-ip which co- 
segregated with early-onset diabetes. The identification of mutations in three members of the HNF-family 
of transcription factors indicates the importance of this regulatory network in the maintenance of glucose 
homeostasis. 

1. Methods 



The study population consisted of 57 unrelated Japanese subjects attending the Diabetes Clinic 
of Tokyo Women's Medical College who were diagnosed with NIDDM before 25 years of age and/or who 
were members of families in which NIDDM was present in three or more generations: age at diagnosis, 
20.1 ± 7.5 years (mean ± SE); male/female, 31/26; and treatment, insulin - 36. oral hypoglycemic agents 
• 10, and diet - 11. These subjects had been screened for mutations in the HNM/M0DY3 gene and all 
were negative for mutations in this gene (Lazzaro et al. ,992). Thirty-two of the subjects met strict 
cnteria for a diagnosis of MODY (,!*, NIDDM in at least three generations with autosomal dominant 
transition and diagnosis before 25 years of age in at least one affected subject). NIDDM was 
diagnosed using the criteria of the World Health Organization (Bennett, 1994). At the time of 
recruitment, informed consent was obtained from each subject and a blood sample was taken for DNA 
■solation. Fifty-three unrelated nondiabetic Japanese subjects were tested for each nucleotide 
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substitution and mutation to determine if the sequence change was a polymorphism or disease-associated 

mutation. 

Pedigree J 2 -20. 

The proband (subject 111-2, FIG. 25) presented with glucosuria at 10 years of age and was 
hospitalized. She was diagnosed with diabetes and treated with insulin for two days and then with diet 
only for two years. At 12 years of age, she resumed insulin therapy (28 U/day). She came to clinical 
attention again at 21 years because of a pyelonephritis and poorly controlled diabetes. At 23 years of 
age, she was admitted to the hospital of Tokyo Women's Medical College because of blurred vision. Her 
urine C-peptide levels at this time were 3.2 glday (normal, 50 ± 25 glday) indicating low insulin secretory 
capacity. Despite persistent high blood glucose levels, she had no history of ketosis. The subject was 
diagnosed with NIDDM based on her clinical course. Subject III 3 presented with general fatigue at 15 
years of age. He had gained 15 kg during the previous three months and his weight at the time of 
presentation was 75 kg. He was diagnosed with diabetes and was treated first with insulin and then diet 
and exercise. He was well controlled when he maintained his weight at 60 kg. At 18 years of age. he 
had gained weight again and insulin treatment was initiated. His urinary C-peptide at this time was 
57.5 glday with fasting C-peptide and glucose levels of 2.4 ng/ml and 106 mg/dl, respectively. There was 
no history of ketosis and he was diagnosed with NIDDM. He presently shows diminished pancreatic-cell 
function with no increase in C-peptide levels following administration of glucagon. All individuals shown 
in FIG. 25 were invited to participate in this study but many declined to do so. 
Isolation and partial sequence of human HNF- lb gene. 

The PAC clone 319P12 containing the human HNMp gene was isolated from a library (Genome 
Systems, St. Louis, MO) by screening PAC DNA pools using polymerase chain reaction (PCR™) and the 
primers vHNFP! (5'-CCTCATGGAGAAACATCCTAAGT-3 r ) (SEQ ID N0:124) and vHNFP2 
(5'-AGGGAGTGCACGGCTGAGCTCCTG-3'l (SEQ ID NO: 125). The sequences of the exons. flanking 
introns and promoter region were determined by sequencing PCR™ products and appropriate restriction 
fragments cloned into pGEM® 4Z (Promega, Madison, Wl) with an AmpliTaq FS Dye Terminator cycle 
sequencing kit (Perkin-Elmer, Norwalk, CT) and ABI Prism™ 377 DNA sequencer. Primers for PCR™ and 
sequencing were selected using the exonintron organization of the human HNMa gene (Yamagata ei 
a/., 1996a) as a guide since related genes often have similar exon intron organizations. The partial 
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eequence of ,h. human HNMp including prom(ller nas taen depos , ei| „ 
under accession numbers U90279-90287 and U96079. 
Mutation screening. 

The nine exons, flanking intrans and minimal promoter „, ion „, tte HNF ., p ^ ^ 
us,ng PGR™ and specific prime* (Table ,7) and the PGR™ p,.duc,s were sequenced from M ^ K 
described above. PGR™ ,o, exon 1 „.s carried „si„ g eLONGase Enzyme- Mi, (Life Technolomds 
Gra™. island. NV. with denotation a, 94»C to , a™, IMwni ty 35 cycles of „ M „ c J 

30 s. annealing „ 55-C f„ 30 s ami extension at 68-C .or , mm. and final tension a. 68-C for 
10»»n. PGR™ for axons 2-9 was earned oo, using Ta, DMA polymerase and 1.5 mM MgO, with 
denaturation e, 94'C fo, 5 min followed by 35 cycles of denotation a, 94'C for 30 a. annealing a, 
60»C fo, 30 a and extension „ 72'C .or 30 s. and final extension a, 72-C fo, 10 min. The sequence of 
each molation was cnfhmed by cloning the PGR™ p,od„c, mto pGEM®-T Easy ,P,omega. Madison WO 
ond poncing clones ,epn, S e„,i„, both adetes. Ex.„s 2,. „ ,ho OCoH gene were amplified osing T,q 
ONA polymcase/I.S mM MgCI 2 and specific p,ime, S (Table 16! and seqoenced as described above Exon 
1 of the DCoH gene encoding ,h. S'-unrranslated ,egi=n and the iniriating Mo, was refractory ,o P C R~ 
amplification and ,h.,e..m „ as „„, meenBd for mutations The ^ § ^ ^ 

pohmwplnsm in othe, individuals was dotenninad by PCR.RFLP analysis if i, manned in ,„ e gain/toss of a 
s-te fo, a restriction endonuclease. n, PCR™ and dime, sequencing if them wee no change in a ait. 



The homan HNF-lp ,STS WI-7310) and DCoH gem* we,e mapped and confirmed ,o YACs 969C9 
ichmmosom. 17) (Scholer e, ,9961 and 849H3 (chmmoaome 10), mspectiwdy. The adjacent 
polymorphic STSa D17S.788 and 01081688 ware tested fo, linkage with NIDDM in Japanese affected 
a* pairs (258 and 268 possible pairs, respectively). In .he genome-wide screen of Mexican American 
affected * pahs 23. ,he HNF-1p and DCoH genes em i„ ,he intervals D17S1293 017S1299 and 
D10S589-D10S535, respectively ISchuIer « a/., 1996). 

Transaction studies of normal and mutant human HKftb. 

The concoct pcDNA3.1-HNF-1p was prepa,ad by cloning tee ,ypa A human HNF-ip cDNA 
(nucleotides 195-2783 inclusive. GenBan, Accession No. X58840; SEQ ID N0:128) in,o pc0NA3 1. 
Ilnvitrogen. Carisbad. CAI. The R177X mu,a„.„ was „, roduced by 
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(QuikChange™ mutagenesis kit; Stratagene, La Jolla, CA) to generate pcDNA3.1-HNF-1BR177X. The 
reporter gene construct pGL3-RA was prepared by cloning the promoter of the rat albumin gene, 
nucleotides -170 to + 5 (Ringeisen et al, 1993), into the firefly luciferase reporter vector pGL3-Basic 
(Promega, Madison, Wl). The sequences of all constructs were confirmed. HeLa cells were transfected 

5 for 5 hr using lipofectAMINE"" (GIBCO BRL, Gaithersburg, MD) with 500 ng of pGL3-RA. 250 ng of 
pcDNA3.1-HNF1B or pcDNA3.1-HNF-ip R177X, and 25 ng of pRL-SV40 to control for efficiency of 
transfection. pcDNA3.1 + DNA was added to each transfection so that the final amount of DNA added 
was 2g. After 24 h, the transactivation activity of the normal and mutant HNF-1B proteins was 
measured using the Dual-luciferase™ Reporter Assay System (Promega, Madison. Wl). 

10 2. Results 

The nine exons, flanking introns and minimal promoter region of the human HNF-1B gene [TCF2\ 
which encode all forms of HNF1B were screened for mutations in 57 unrelated Japanese subjects with 
MODY. This analysis revealed four nucleotide substitutions, a C T substitution in codon 177 (exon 2) in 
the proband from family J2-20 which generated a nonsense mutation CGA lArg) TGA (OP) (R177X) (FIG. 
15 24} r an uncommon silent mutation in codon 463 lexon 7| for which one subject was homozygous, and two 
polymorphisms in intron 8 (Table 15), neither of which is predicted to affect RNA splicing. The nonsense 
mutation R177X was not found on screening 53 unrelated non-diabetic Japanese subjects. One 
nondiabetic subject was heterozygous for the silent mutation in codon 463 (Table 15). 
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Table 15 



Mutations and DMA polymorphisms in human HIUF1B and OCoH genes 




Location 


Nucleotide 


Frequency 


Site 


Codon 




Patients In =57) 


Controls 


A. HNF-1p 
txon l 


177 


CGA(Arg)— ►TBA (OP) 


C-0.99; T-0.01 


C-1.00; T-0.00 


Exon 7 


463 


GCC{Ala)-»GCT(Ala) 


C-0.98; T-0.02 


C-0.99; T-0.01 


Intron 8 


nt48 


Insertion C 


C-0.12 


C-0.17 


Intron 8 


nt -22 


C->T 


C-0.71;T0.29 


C-0.68; T-0.32 


B. DCoH 








Exon 4 


nt 9306 


A->G 


A-0.82 


A-0.80; G 0.20 



DNA polymorphisms found in introns are noted relative to the splice donor or acceptor site nt 
nucleotide. In the HNFIf gene the C->T substitution in codon 463 and the C-insertion polymorphism in 
mtorn 8 nt 48, result in the gain of a Ode 1 site and loss of a Nae I. respectively. In the human DCoH gene 
(Genbank accession no. L41560, incorporated herein by reference), the nt 9306 is in the region encoding 
the S'-untranslated region of DcoH mRNA and is 36 nucleotides after the translation termination codon. 

Family J2-20 shows bilineal inheritance of diabetes (FIG. 25). The R177X mutation, which was 
maternally inherited, is associated with early-onset NIDDM, progression to insulin treatment and severe 
complications. The earlier age at diagnosis in the proband and her brother may be due to the inheritance 
of diabetes-susceptibility genes from both parents. The paternal diabetes gene which may potentiate the 
effect of the HNF-10 mutation is unknown but is not another known MODY gene as mutations were not 
found in the HNF-1a and HNF-4a and glucokinase genes of the proband (Iwasaki, et ai, 1997; Furuta 
ei al., ,997; Iwasaki et ai, ,995). The proband's older brother had been healthy until developing a 
common cold and died one week later of diabetic ketoacidosis. The proband's maternal grandparents, 
both of whom are deceased, were not known to have diabetes. However, she has a maternal uncle with 
mild diet-controlled NIDDM diagnosed at 60 years of age. The difference in phenotype between the 
proband's mother and maternal uncle and the absence of diabetes in the maternal grandparents suggest 
that the R177X mutation may represent a new mutation in the proband's mother. The father and two 
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paternal uncles have late-onset NIDDM treated with oral hypoglycemic agents. The proband's paternal 
grandmother was reported to have had diabetes. The presence of WIODY and late-onset NIDDM within 
the same family is not unusual and has been reported previously (Bell et al, 1991). With respect to the 
presence of nephropathy in the subjects with the R177X mutation in HNMp , it is interesting to note 
that HNMB is expressed at highest levels in kidney (Mendel et al, 1991a; De Simone et al, 1991; 
Rey-Campos et al, 1991; Bach and Yaniv, 1993; Lazzaro et al, 1992) and perhaps decreased 
levels of this transcription factor contribute to renal dysfunction. 

HNMB contains a bipartite DNA binding region consisting of a POU-like element and a 
homeodomain (Mendel et al, 1991a; De Simone et al, 1991; Rey-Campos et al. 1991; Bach and 
Yaniv, 1993). The R177X mutation is located at the end of the POU-like domain and generates a protein 
of 176 amino acids having the NH 2 -dimerization and POU domains (Cereghini, 1996; Mendel et al. 
1991a; De Simone et al, 1991; Rey-Campos et al, 1991; Bach and Yaniv, 1993). This truncated 
protein cannot stimulate transcription of a rat albumin promoter-linked reporter gene and does not inhibit 
the activity of wild-type HNMp (Table 16). This suggests that the R177X mutation represents a loss of 
function mutation which results in decreased HNMB levels and a corresponding reduction in expression 
of HNF-1 B target genes. 

Table 16. 

Transactiviation activity of human HNF-1 B and R177X mutation. 



Construct 


Normalized Activity 
(Firefly Lucif erasel/temV/a lucif erase) 


pcDNA 3.1 


3.5 ± 0.5 


pc DNA 3.1 -HNF-1 p 


25.1 ± 3.2 


pc DNA 3.1-R177X 


3.8+1.0 


pcDNA 3.1 -HNF-1 P + pcDNA 3.1-R177X 


32.2 ± 2.8 



The activity of each construct was meassured in triplicate and the mean +SD is shown, 
results are representative of at least two independent experiments. 
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Human DCoH is a protein of 104 amino acids (including the initiating methionine) (Thony et al. 
1995). Exons 2-4 which encode amino acids 2-104 were screened for mutations in the 57 unrelated 
Japanese subjects with MODY described above. The sequences were identical to one another except for 
an A G polymorphism located in the 3'-untranslated region {Table 15), the frequency of which was not 
different between MODY and nondiabetic subjects. Thus, mutations in OCoH do not appear to contribute 
to the development of MODY in Japanese. 

The frequency of HNF-1 p mutations in the inventors' study population of Japanese subjects with 
MODY is 2% (1/57) which islhe same as for mutations in HNF-4a (Furuta et al., 1997) whereas the 
frequency of HNF-1 a mutations is about 8% (Iwasaki, et al., 1997) (the frequency of glucokinase 
mutations in this sample is unknown). However, genetic variation in HNF-1 p or DCoH is unlikely to be a 
major factor contributing to the more common late-onset NIDOM as there is no evidence for linkage of 
markers adjacent to these genes with diabetes in Japanese or Mexican American affected sib pairs 
(Hanise/a/., 1996). 

The association of a mutation in HNF-1 p with diabetes indicates the importance of the HNF- 
regulatory network in determining pancreatic-cell function. Moreover, HNF-1 a is not able to compensate 
for the reduction in HNF-1 p activity implying that the primary target genes for these transcription factors 
in pancreatic p-cells are different. The identification of these target genes will provide a better 
understanding of the molecular mechanisms that determine normal-cell function and may lead to new 
approaches for treating diabetes. 

EXAMPLE 9 

Elucidation of the Genes Responsible for Additional MODY Disease States 
The inventors have identified that various MODY-type diabetes disease states are caused by 
mutations in various HNF proteins in the diseased individuals. However, the inventors are also aware of 
families that exhibit classic "MODY" disease states that are not caused by mutations in HNF1a, HNF1p, 
or HNF4a. Therefore, one aspect of this invention is to continue to screen the genetic complement of 
these families to determine the genes that cause these additional MODY disease states. Such screening 
can be done in the manner successfully used by the inventors to screen for the causes of M0DY1, 
M0DY2. and MODY 3. One of ordinary skill will be able and motivated in view of the teachings of this 
application, to work towards elucidating genes that, when mutated, cause additional MODY disease 
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states. Once such genes are elucidated, all aspects diagnostic, treatment, and other aspects of the 
invention will be realizable by those of skill in the art for those additional MODY causations. In order to 
achieve these aspects of the invention, one will simply have to modify procedures and protocols taught in 
this specification to be appropriate to the specific gene determined to cause a MODY disease state. 

* * # 

All of the compositions and/or methods disclosed and claimed herein can be made and executed 
without undue experimentation in light of the present disclosure. While the compositions and methods of 
this invention have been described in terms of preferred embodiments, it will be apparent to those of skill 
in the art that variations may be applied to the compositions and/or methods and in the steps or in the 
sequence of steps of the method described herein without departing from the concept, spirit and scope of 
the invention. More specifically, it will be apparent that certain agents which are both chemically and 
physiologically related may be substituted for the agents described herein while the same or similar 
results would be achieved. All such similar substitutes and modifications apparent to those skilled in the 
art are deemed to be within the spirit, scope and concept of the invention as defined by the appended 
claims. 
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SEQUENCE LISTING 



(1) GENERAL INFORMATION: 
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(F) POSTAL CODE (ZIP) : Unknown 



(ii) TITLE OF INVENTION: MUTATIONS IN THE DIABETES SUSCEPTIBILITY 
GENES HEPATOCYTE NUCLEAR FACTOR (HNF) 1 ALPHA, HNF-1BETA 
and HNF - 4 ALPHA 



(iii) NUMBER OF SEQUENCES: 147 

(iv) COMPUTER READABLE FORM: 

(A) MEDIUM TYPE: Floppy disk 

(B) COMPUTER: IBM PC compatible 

(C) OPERATING SYSTEM: PC-DOS/MS-DOS 

<D> SOFTWARE: Patentln Release #1.0, Version #1.30 (EPO) 



(vi> PRIOR APPLICATION DATA: 

(A) APPLICATION NUMBER: US Unknown 

(B) FILING DATE: 09-SEP-1996 

(vi) PRIOR APPLICATION DATA: 

(A) APPLICATION NUMBER: US 60/02 9,679 

(B) FILING DATE: 30-OCT-1996 

(vi) PRIOR APPLICATION DATA: 

(A) APPLICATION NUMBER: US 60/028,056 

(B) FILING DATE: 02-OCT-1996 

(vi) PRIOR APPLICATION DATA: 

(A) APPLICATION NUMBER: US 60/025,719 

(B) FILING DATE: 10-SEP-1996 



(2) INFORMATION FOR SEQ ID NO : 1: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 3238 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ix) FEATURE: 

(A) NAME / KEY : modif ied_base 

(B) LOCATION: 988 

(D) OTHER INFORMATION: /mod_base= OTHER 
/note- M N = A, C, G, or T" 



(ix) FEATURE: 

(A) NAME /KEY : CDS 

(B) LOCATION: join (24. .986, 990.. 1916) 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 1: 

CGTGGCCCTG TGGCAGCCGA GCC ATG GTT TCT AAA CTG AGC CAG CTG CAG 

Met Val Ser Lys Leu Ser Gin Leu Gin 
1 5 

ACG GAG CTC CTG GCG GCC CTG CTC GAG TCA GGG CTG AGC AAA GAG GCA 
Thr Glu Leu Leu Ala Ala Leu Leu Glu Ser Gly Leu Ser Lys Glu Ala 



50 



98 



WO 92/U2M 



160 



PCT/US97/16JK37 



10 



15 



20 



25 



CTG ATC CAG GCA CTG GGT GAG CCG GGG CCC TAG CTC CTG GCT «.» ™» 
L6U ^ «« >™ «y P f o Tyf "J S All £y 



35 40 



GGC CCC CTG GAC AAG GGG GAG TCC TGC GGC GGC GGT CGA GGG GAG CTG 
Gly Pro Leu Asp Lys Gly Glu Ser Cys Gly Gly Gly Ty Su 2! 



55 



GCT GAG CTG CCC AAT GGG CTG GGG GAG ACT CGG GGC TCC GAG GAC GAG 
Ala Glu Leu Pro Asn Gly Leu Gly Glu Thr Arg Gly Ser ctu Tu 



65 70 



ACG GAC GAC GAT GGG GAA GAC TTC ACG CCA CCC ATC CTC AAA GAG CTG 
Thr Asp Asp Asp Gly Glu Asp P he Thr Pro Pro Ile ™ ^ GAG £ G 



80 



85 



GAG AAC CTC AGC CCT GAG GAG GCG GCC CAC CAG AAA GCC GTG GTG GAC 
Glu Asn Leu Ser Pro Glu Glu Ala Ala His Gin Lys 2a vll SS Tu 



100 



105 



THr IZ Su £f° £° ° CG TGG CGT GTG GCG **° ATG GTC AAG TCC 

Thr Leu Leu Gin Glu Asp Pro Trp Arg Val Ala Lys Met Val Lys Ser 

115 120 

TAC CTG CAG CAG CAC AAC ATC CCA CAG CGG GAG GTG GTC GAT ACC ACT 
Tyr Leu Gin Gin His Asn He Pro Gin Arg Glu Val Val Asp Jhr i£ 
125 "0 135 

GGC ?l C f C «° TCC «C CTG TCC CAA CAC CTC AAC AAG GGC ACT CCC 
Gly Leu Asn Gin Ser His Leu Ser Gin His Leu Asn Lys Gly £ £o 

145 150 

ATG AAG ACG CAG AAG CGG GCC GCC CTG TAC ACC TGG TAC GTC CGC AAG 
Met Lys Thr Gin Lys Arg Ala Ala Leu Tyr Thr Trp T Jr Tl Arg £s 

160 



165 



Sn 2* S° 00 ^ TTC ACC CAT GCA GGG GGA GGG CTG 

Gin Arg Glu Val Ala Gin Gin Phe Thr His Ala Gly Gin Gly oiy Su 

175 180 1B5 

ill Jf° CC ° ACA GGT GAT GAG CTA CCA AC ^ AAG AAG GGG CGG AGG 

He Glu Glu Pro Thr Gly Asp Glu Leu Pro Thr Lys Lys Gly JJJ J2 

190 1" 20 o 

AAC CGT TTC AAG TGG GGC CCA GCA TCC CAG CAG ATC CTG TTC CAG GCC 
Asn Arg Phe Lys Trp Gly Pro Ala Ser Gin Gin lie Leu Phe Gin 111 
205 210 215 

?yi tin s s G r 3 ? c cct agc ^ gag gag cga gag acg c * a gtg 

Tyr Glu Arg Gin Lys Asn Pro Ser Lys Glu Glu Arg Glu Thr Leu Val 
220 2 25 230 

GAG GAG TGC AAT AGG GCG GAA TGC ATC CAG AGA GGG GTG TCC CCA TCA 
Glu Glu Cys Asn Arg Ala Glu Cys He Gin Arg Gly Val Ser Pro Ser* 
235 240 24 5 



146 



194 



242 



290 



338 



386 



434 



482 



530 



578 



626 



674 



722 



770 



WO 98/11254 



161 



PCT/US97/16037 



CAG GCA CAG GGG CTG GGC TCC AAC CTC GTC ACG GAG GTG CGT GTC TAG 
Sn Ali Gin Gly Leu Gly Ser Asn Leu Val Thr Glu Val Arg Val Tyr 
250 255 260 265 

AAC TGG TTT GCC AAC CGG CGC AAA GAA GAA GCC TTC CGG CAC AAG CTG 
itn S Ala Asn Arg Arg Lys Glu Glu Ala Phe Arg Hxs Lys Leu 

270 275 280 

GCC ATG GAC ACG TAC AGC GGG CCC CCC CCA GGG CCA GGC CCG GGA CCT 
til Met Asp Thr Tyr Ser Gly Pro Pro Pro Gly Pro Gly Pro Gly Pro 
285 290 295 



GCG CTG CCC GCT CAC AGC TCC CCT GGC CTG CCT CCA CCT GCC CTC TCC 
lit Leu Pro Ala His Ser Ser Pro Gly Leu Pro Pro Pro Ala Leu Ser 
300 305 310 

CCC AGT AAG GTC CAC GGT GTG CGC TNT GGA CAG CCT GCG ACC AGT GAG 
Pro Sr Lys Val His Gly Val Arg Gly Gin Pro Ala Thr Ser Glu 

3!S 320 325 



ACT GCA GAA GTA CCC TCA AGC AGC GGC GGT CCC TTA GTG ACA GTG TCT 
rll Ala Glu val Pro Ser Ser Ser Gly Gly Pro Leu Val Thr Val Ser 
330 335 340 

ACA CCC CTC CAC CAR GTG TCC CCC ACG GGC CTG GAG CCC AGC CAC AGC 
Thr Pro Leu His Gin Val Ser Pro Thr Gly Leu Glu Pro Ser Hxs Ser 
345 350 355 360 

CTG CTG AGT ACA GAA GCC AAG CTG GTC TCA GCA GCT GGG GGC CCC CTC 
Leu Leu Ser Thr Glu Ala Lys Leu Val Ser Ala Ala Gly Gly Pro Leu 
365 370 375 

CCC CCT GTC AGC ACC CTG ACA GCA CTG CAC AGC TTG GAG CAG ACA TCC 
Pro Pro Val Ser Thr Leu Thr Ala Leu His Ser Leu Glu Gin Thr Ser 
380 385 390 

CCA GGC CTC AAC CAG CAG CCC CAG AAC CTC ATC ATG GCC TCA CTT CCT 
Pro Gly Leu Asn Gin Gin Pro Gin Asn Leu He Met Ala Ser Leu Pro 
395 400 405 

GGG GTC ATG ACC ATC GGG CCT GGT GAG CCT GCC TCC CTG GGT CCT ACG 
Gly Val Met Thr He Gly Pro Gly Glu Pro Ala Ser Leu Gly Pro Thr 
410 415 420 

TTC ACC AAC ACA GGT GCC TCC ACC CTG GTC ATC GGC CTG GCC TCC ACQ 
Phe Thr Asn Thr Gly Ala Ser Thr Leu Val He Gly Leu Ala Ser Thr 

435 440 



425 «0 



CAG GCA CAG AGT GTG CCG GTC ATC AAC AGC ATG GGC AGC AGC CTG ACC 
Gin Ala Gin Ser Val Pro Val He Asn Ser Met Gly Ser Ser Leu Thr 



445 



450 455 



ACC CTG CAG CCC GTC CAG TTC TCC CAG CCG CTG CAC CCC TCC TAC CAG 
Thr Leu Gin Pro Val Gin Phe Ser Gin Pro Leu His Pro Ser Tyr Gin 
460 «5 470 



618 



866 



914 



962 



1010 



1058 



1106 



1154 



1202 



1250 



1298 



1346 



1394 



X442 



WO 98/11254 

JPCT/US97/16®37 

162 



2 2 2 2 s 2 s - - - s ~ «. = « « 

480 485 
ATG GCC ACC ATG GCT CAG CTG CAG AGC err ™„ 

Met Ala Thr Met Ala Gin Leu Gin Set Pro 2S ^ ™ AG ° CAC 

490 4 „ " Ser Pro Hls Ala Leu Tyr Ser His 

495 500 

S s - s s 2 2 s 2 s 5 2 2 2 - s 

= = 22 2 2 s 2 2222 2 2 2 2 

535 

CCC ACC AAG CAG GTC TTC ACC Tra pap 

- * , y . jj. « 2 « 2 « s ~ « s = « = 

545 550 

2 = 2 2 2 2 2 2 2 g 2 2 2 2 2 2 

560 565 

= 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 

b75 580 

S Ser £ £ S E E £ sf ™ C ™ TCA 

585 „' SSr Ser Ser Leu Va l Leu Tyr Gin Ser Ser 

90 595 600 

5 = 2 2 2 2 2 2 2 2 2 2 2 2 2 2 
2 o2 2 2 2 2 2 2 2 2 2 2 2 2 " 



625 630 



1490 



1538 



1586 



1634 



1682 



1730 



1778 



1826 



1874 



1916 



TAACCACGGC 


ACCTGGGCCC 


TGGGGCCTGT 


ACTGCCTGCT 


TGGGGGGTGA 


TGAGGG CAGC 


1976 


AGCCAGCCCT 


GCCTGGAGGA 


CCTGAGCCTG 


CCGAGCAACC 


GTGGCCCTTC 


CTGGACAGCT 


2036 


GTGCCTCGCT 


CCCCACTCTG 


CTCTGATGCA 


TCAGAAAGGG 


AGGGCTCTGA 


GGCGCCCCAA 


2096 


CCCGTGGAGG 


CTGCTCGGGG 


TGCACAGGAG 


GGGGTCGTGG 


AGAGCTAGGA 


GCAAAGCCTG 


2156 


TTCATGGCAG 


ATGTAGGAGG 


GACTGTCGCT 


GCTTCGTGGG 


ATACAGTCTT 


CTTACTTGGA 


2216 


ACTGAAGGGG 


GCGGCCTATG 


ACTTGGGCAC 


CCCCAGCCTG 


GGC CTATGGA 


GAGCCCTGGG 


2276 


ACCGCTACAC 


CACTCTGGCA 


GCCACACTTC 


TCAGGACACA 


GGCCTGTGTA 


GCTGTGACCT 


2336 


GCTGAGCTCT 


GAGAGGCCCT 


GGATCAGCGT 


GGCCTTGTTC 


TGTCACCAAT 


GTACCCACCG 


2396 


GGCCACTCCT 


TCCTGCCCCA 


ACTCCTTCCA 


GCTAGTGACC 


CACATGCCAT 


TTGTACTGAC 


2456 



163 



CCCATCACCT ACTCACACAG GCATTVCCTG GGTGGCTACT CTGTGCCAGA GCCTGGGGCT 
CTAACTGCCT GAGCCCAGGG AGGCCGAAGC TAACAGGGAA GGCAGGCAGG GCTCTCCTGG 
TCTTCCCATC CCCAGCGATT CCCTCTCCCA GGCCCCATGA CCTCCAGCTT TCCTGTATTT 
CTTCCCAAGA GCATGATGCC TCTGAGGCCA GCCTGGCCTC CTGCCTCTAC TGGGAAGGCT 
ACTTCGGGGC TGGGAAGTCG TCCTTACTCC TGTGGGAGCC TCGCAACCCG TGCCAAGTCC 
AGGTCCTGGT GGGGCAGCTC CTCTGTCTCG AGCGCCCTGC AGACCCTGCC CTTGTTTGGG 
GCAGGAGTAG CTGAGCTCAC AAGGCAGCAA GGCCCGAGCA GCTGAGCAGG GCCGGGGAAC 
TGGCCAAGCT GAGGTGCCCA GGAGAAGAAA GAGGTGACCC CAGGGCACAG GAGCTACCTG 
TGTGGACAGG ACTAACACTC AGAAGCCTGG GTGCCTGGCT GGCTGAGGGC AGTTCGCAGC 
CACCCTGAGG AGTCTGAGGT CCTGAGCACT GCCAGGAGGG ACAAAGGAGC CTGTGAACCC 
AGGACAAGCA TGGTCCCACA TCCCTGGGCC TGCTGCTGAG AACCTGGCCT TCAGTGTACC 
GCGTCTACCC TGGGATTCAG GAAAAGGCCT GGGGTGACCC GGCACCCCCT GCAGCTTGTA 
GCCAGCCGGG GCGAGTGGCA CGTTTATTTA ACTTTTAGTA AAGTCAAGGA GAAATGCGGT 



GG 



(2) INFORMATION FOR SEQ ID NO : 2: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: €30 amino acids 

(B) TYPE: amino acid 
<D) TOPOLOGY : linear 

(ii) MOLECULE TYPE: protein 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 2: 

Met val Ser Lys Leu Ser Gin Leu Gin Thr Glu Leu Leu Ala Ala Leu 
l 5 10 15 

Leu Glu Ser Gly Leu Ser Lys Glu Ala Leu He Gin Ala Leu Gly Glu 

25 30 



20 



Pro Gly Pro Tyr Leu Leu Ala Gly Glu Gly Pro Leu Asp Lys Gly Glu 
35 40 45 

Ser Cys Gly Gly Gly Arg Gly Glu Leu Ala Glu Leu Pro Asn Gly Leu 
50 55 60 

Gly Glu Thr Arg Gly Ser Glu Asp Glu Thr Asp Asp Asp Gly Glu Asp 
65 70 75 

Phe Thr Pro Pro lie Leu Lys Glu Leu Glu Asn Leu Ser Pro Glu Glu 
qq 90 95 



251o 

2576 

2636 

2696 

2756 

2816 

2876 

2936 

2996 

3056 

3116 

3176 

3236 

3238 



WO 98/11254 

PCT/US97/16037 

164 



Ala Ala His Gin Lys Ala Val Val Glu Vhr Leu Leu Gin Glu Asp Pro 

105 110 
Trp Arg Val Ala Lys Met Val Lys Ser Ty r Leu Gln Qln ^ Agn 

120 125 

Pro Gin Arg Glu Val Val Asp Thr Thr Gly Leu Asn Gin Ser His Leu 

135 140 
Ser Gin His Leu Asn Lys Gly Thr Pro Met Lys Thr Gin L ys Arg Ala 

155 160 

Ala Leu Tyr Thr Trp Tyr Val Arg Lys Gin Arg Glu Val Ala Gin Gin 

165 170 175 

Phe Thr His Ala Gly Gin Gly Gly Leu lie Glu Glu Pro Thr Gly Asp 



18 5 iso 



Glu Leu Pro Thr Lys Lys Gly Arg Arg Asn Arg Phe Lys 



200 ' Gly Pr ° 

200 205 



Ala ser Gin Gin He Leu Phe Gin Ala Tyr Glu 



— ^ ^ n A±a Tyr Qlu Arg Gln ^ 

2X5 220 
Ser Lys Glu Glu Arg Glu Thr Leu Val Glu Glu Cys Asn Arg Ala Glu 

230 2 " 240 

Cys lie Gln Arg Gly Val Ser Pro Ser Gln Ala Gln Gly Leu Gly Ser 

245 250 255 

Asn Leu Val Thr Glu Val Arg Val Tyr Asn Trp P he Ala Asn Arg Arg 

265 270 
L ys Glu Glu Ala Phe Arg His Lys Leu Ala Met Asp Thr Tyr Ser Gly 



280 285 



Pro Pro Pro Gly Pro Gly Pro Gly 



290 " — * Pr ° Ala Leu Pro Ala "is Ser Ser 

295 300 

305 Ala ^ S « P ~ ^r Lys Val His Gly Val 

310 315 320 



Arg Gly Gln Pro Ala Thr Ser Glu Thr Ala Glu Val Pro Ser Ser Ser 

325 330 335 

Gly Gly Pro Leu Val Thr Val Ser Thr Pro Leu His Gln Val Ser Pro 

40 34 * 350 

Thr Gly Leu Glu Pro Ser His Ser Leu Leu Ser Thr Glu Ala Lys Leu 

360 365 

Val Ser Ala Ala Gly Gly Pro Leu Pro Pro Val Ser Thr Leu Thr Ala 

375 380 
Leu His ser Leu Glu Gln Thr Ser Pro Gly Leu Asn Gln Gln Pro Gln 

395 400 



WO 98/11254 



165 



PCT/US97/16037 



Asn Leu He Met Ala Ser Leu Pro Gly Val Met Thr He Gly Pro Gly 
405 410 415 

Glu Pro Ala Ser Leu Gly Pro Thr Phe Thr Asn Thr Gly Ala Ser Thr 
420 425 430 

Leu Val He Gly Leu Ala Ser Thr Gin Ala Gin Ser Val Pro Val He 
435 440 445 

Asn Ser Met Gly Ser Ser Leu Thr Thr Leu Gin Pro Val Gin Phe Ser 
450 455 460 

Gin Pro Leu His Pro Ser Tyr Gin Gin Pro Leu Met Pro Pro Val Gin 
465 470 475 480 

Ser His Val Thr Gin Ser Pro Phe Met Ala Thr Met Ala Gin Leu Gin 
485 490 495 

Ser Pro His Ala Leu Tyr Ser His Lys Pro Glu Val Ala Gin Tyr Thr 
500 505 510 

His Thr Gly Leu Leu Pro Gin Thr Met Leu He Thr Asp Thr Thr Asn 
515 520 525 

Leu Ser Ala Leu Ala Ser Leu Thr Pro Thr Lys Gin Val Phe Thr Ser 
530 535 540 



Asp Thr Glu Ala Ser Ser 



Glu Ser Gly Leu His Thr Pro Ala Ser Gin 



545 550 555 560 

Ala Thr Thr Leu His Val Pro Ser Gin Asp Pro Ala Gly He Gin His 
565 570 575 

Leu Gin Pro Ala His Arg Leu Ser Ala Ser Pro Thr Val Ser Ser Ser 
580 585 590 

Ser Leu Val Leu Tyr Gin Ser Ser Asp Ser Ser Asn Gly Gin Ser His 
595 600 605 

Leu Leu Pro Ser Asn His Ser Val He Glu Thr Phe He Ser Thr Gin 
610 615 620 

Met Ala Ser Ser Ser Gin 
625 630 



(2) INFORMATION FOR SEQ ID NO : 3: 

(i) SEQUENCE CHARACTERISTICS: 

<A) LENGTH: 3238 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 



(ix) FEATURE: 

(A) NAME /KEY : modif ied_base 



1P , CT/US97/11«S037 

166 

(B) LOCATION: 986 

(D) OTHER INFORMATION : /mod_base= OTHER 
/note= "N = a, C, G, or T" 

(ix) FEATURE: 

(A) NAME/KEY: CDS 

(B) LOCATION: join (24 . . 986 , 990.. 1916) 
<xi) SEQUENCE DESCRIPTION: SEQ ID NO: 3: 

CGTGGCCCTG TGGCAGCCGA GCC ATG GTT TCT AAA CTG AGC CAG CTG CAG 

Met Val Ser Lys Leu Ser Gin Leu Gin 
1 5 

ACQ GAG CTC CTG GCG GCC CTG CTC GAG TCA GGG CTG AGC AAA GAG GCA 
Thr Glu Leu Leu Ala Ala Leu Leu Glu Ser Gly Leu Ser L^ gJu S 



15 20 



25 



3° 000 CC ° TAC CTC CTG GG * GGA GAA 

Leu He Gin Ala Leu Gly Glu Pro Gly Pro Tyr Leu Leu Ala Gly Glu 
30 35 



40 



GGC CCC CTG GAC AAG GGG GAG TCC TGC GGC GGC GGT CGA GGG GAG CTG 
Gly Pro Leu Asp Lys Gly Glu Ser Cys Gly Gly Gly Arg Gly Su Su 
45 50 55 

GCT GAG CTG CCC AAT GGG CTG GGG GAG ACT CGG GGC TCC GAG GAC GAG 
Ala Glu Leu Pro Asn Gly Leu Gly Glu Thr Arg Gly Ser Glu Asp SS 



65 



70 



iS T t GAA GA ° WC ACG CCA CCC ATC CTC AAA GAG CTG 

Thr Asp Asp Asp Gly Glu Asp Phe Thr Pro Pro lie Leu Lys Glu Su 

80 8S 

Gil A^n S C f C ° AG GCG GCC ^ « *** GCG C3TG GTG GAG 

Glu Asn Leu Ser Pro Glu Glu Ala Ala His Gin Lys Ala Val Val Glu 

9S 100 105 

ACC CTT CTG CAG GAG GAC CCG TGG CGT GTG GCG AAG ATG GTC AAG TCC 
Thr Leu Leu Gin Glu Asp Pro Trp Arg Val Ala Lys Met vll I£ E 
110 US 120 

TAC CTG CAG CAG CAC AAC ATC CCA CAG CAG GAG GTG GTC GAT ACC ACT 
Tyr Leu Gin Gin His Asn He Pro Gin Gin Glu Val Val Asp Jhr JE 
I 25 130 135 

Su 2° CTG TCC CAA ^ CTC ^ ™ GG C ACT CCC 

Gly Leu Asn Gin Ser His Leu Ser Gin His Leu Asn Lys Gly Thr Pro 

140 145 150 

m! G ™° G CAG ^ CGG GCC GCC CTG TAC A CC TGG TAC GTC CGC AAG 

Met Lys Thr Gin Lys Arg Ala Ala Leu Tyr Thr Trp Tyr Val Arg £s 
15S 160 16S 

111 A~ GAG GCG ° AG CAG TTC ACC ^ GCA GGG CAG GGA GGG CTG 

Gin Arg Glu Val Ala Gin Gin Phe Thr His Ala Gly Gin Gly Gly Leu 

175 180 185 



50 



98 



146 



194 



24 2 



290 



338 



386 



434 



482 



530 



578 



WO 98/11254 



167 



PCT/US97/1S037 



ATT GAA GAG CCC ACA GGT GAT GAG CTA CCA ACC AAG AAG GGG CGG AGG 626 
He Glu Glu Pro Thr Gly Asp Glu Leu Pro Thr Lys Lys Gly Arg Arg 
190 195 200 



AAC CGT TTC AAG TGG GGC CCA GCA TCC CAG CAG ATC CTG TTC CAG GCC 
Asn Arg Phe Lys Trp Gly Pro Ala Ser Gin Gin He Leu Phe Gin Ala 

210 215 



205 



220 



674 



TAT GAG AGG CAG AAG AAC CCT AGC AAG GAG GAG CGA GAG ACG CTA GTG 722 
Tyr Glu Arg Gin Lys Asn Pro Ser Lys Glu Glu Arg Glu Thr Leu Val 

225 230 



GAG GAG TGC AAT AGG GCG GAA TGC ATC CAG AGA GGG GTG TCC CCA TCA 770 
Glu Glu Cys Asn Arg Ala Glu Cys He Gin Arg Gly Val Ser Pro Ser 
235 240 245 

CAG GCA CAG GGG CTG GGC TCC AAC CTC GTC ACG GAG GTG CGT GTC TAC 818 
Gin Ala Gin Gly Leu Gly Ser Asn Leu Val Thr Glu Val Arg Val Tyr 
250 255 260 265 

AAC TGG TTT GCC AAC CGG CGC AAA GAA GAA GCC TTC CGG CAC AAG CTG 866 
Asn Trp Phe Ala Asn Arg Arg Lys Glu Glu Ala Phe Arg His Lys Leu 
270 275 280 

GCC ATG GAC ACG TAC AGC GGG CCC CCC CCA GGG CCA GGC CCG GGA CCT 914 
Ala Met Asp Thr Tyr Ser Gly Pro Pro Pro Gly Pro Gly Pro Gly Pro 
285 290 295 

GCG CTG CCC GCT CAC AGC TCC CCT GGC CTG CCT CCA CCT GCC CTC TCC 962 
Ala Leu Pro Ala His Ser Ser Pro Gly Leu Pro Pro Pro Ala Leu Ser 
300 305 310 

CCC AGT AAG GTC CAC GGT GTG CGC TNT GGA CAG CCT GCG ACC AGT GAG 1010 
Pro Ser Lys Val His Gly Val Arg Gly Gin Pro Ala Thr Ser Glu 

315 320 325 

ACT GCA GAA GTA CCC TCA AGC AGC GGC GGT CCC TTA GTG ACA GTG TCT 1056 
Thr Ala Glu Val Pro Ser Ser Ser Gly Gly Pro Leu Val Thr Val Ser 
330 335 340 

ACA CCC CTC CAC CAA GTG TCC CCC ACG GGC CTG GAG CCC AGC CAC AGC 1106 
Thr Pro Leu His Gin Val Ser Pro Thr Gly Leu Glu Pro Ser His Ser 
345 350 355 360 

CTG CTG AGT ACA GAA GCC AAG CTG GTC TCA GCA GCT GGG GGC CCC CTC 1154 
Leu Leu Ser Thr Glu Ala Lys Leu Val Ser Ala Ala Gly Gly Pro Leu 
365 370 375 

CCC CCT GTC AGC ACC CTG ACA GCA CTG CAC AGC TTG GAG CAG ACA TCC 1202 
Pro Pro Val Ser Thr Leu Thr Ala Leu His Ser Leu Glu Gin Thr Ser 
380 385 390 

CCA GGC CTC AAC CAG CAG CCC CAG AAC CTC ATC ATG GCC TCA CTT CCT 12 SO 

Pro Gly Leu Asn Gin Gin Pro Gin Asn Leu He Met Ala Ser Leu Pro 
39S 400 405 



16S 



PCT/US97/1KKB7 



GGG GTC ATG ACC ATC GG6 CCT GGT GAG CCT GCC TCC CTG GGT OCT ACQ 
Gly val Met Thr He Gly Pro Gly Glu Pro Ala Ser Leu Gly Pro Thr 
410 415 420 

TTC ACC AAC ACA GGT GCC TCC ACC CTG GTC ATC GGC CTG GCC TCC ACG 
Phe Thr Asn Thr Gly Ala Ser Thr Leu Val He Gly Leu Ala Ser Thr 
425 430 435 440 

CAG GCA CAG ACT GTG CCG GTC ATC AAC AGC ATG GGC AGC AGC CTG ACC 
Gin Ala Gin Ser Val Pro Val lie Asn Ser Met Gly Ser Ser Leu Thr 
44 5 450 455 

ACC CTG CAG CCC GTC CAG TTC TCC CAG CCG CTG CAC CCC TCC TAC CAG 
Thr Leu Gin Pro Val Gin Phe Ser Gin Pro Leu His Pro Ser Tyr Gin 
460 4S5 470 

CAG CCG CTC ATG CCA CCT GTG CAG AGC CAT GTG ACC CAG AGC CCC TTC 
Gin Pro Leu Met Pro Pro Val Gin Ser His Val Thr Gin Ser Pro Phe 
475 480 4 85 

ATG GCC ACC ATG OCT CAG CTG CAG AGC CCC CAC GCC CTC TAC AGC CAC 
Met Ala Thr Met Ala Gin Leu Gin Ser Pro His Ala Leu Tyr Ser His 
490 495 500 

AAG CCC GAG GTG GCC CAG TAC ACC CAC ACG GGC CTG CTC CCG CAG ACT 
Lys Pro Glu Val Ala Gin Tyr Thr His Thr Gly Leu Leu Pro Gin Thr 
505 510 515 520 

ATG CTC ATC ACC GAC ACC ACC AAC CTG AGC GCC CTG GCC AGC CTC ACG 
Met Leu lie Thr Asp Thr Thr Asn Leu Ser Ala Leu Ala Ser Leu Thr 
525 530 535 

CCC ACC AAG CAG GTC TTC ACC TCA GAC ACT GAG GCC TCC AGT GAG TCC 
Pro Thr Lys Gin Val Phe Thr Ser Asp Thr Glu Ala Ser Ser Glu Ser 
540 545 550 

GGG CTT CAC ACG CCG GCA TCT CAG GCC ACC ACC CTC CAC GTC CCC AGC 
Gly Leu His Thr Pro Ala Ser Gin Ala Thr Thr Leu His Val Pro Ser 
555 560 565 

CAG GAC CCT GCC GGC ATC CAG CAC CTG CAG CCG GCC CAC CGG CTC AGC 
Gin Asp Pro Ala Gly He Gin His Leu Gin Pro Ala His Arg Leu Ser 
570 57S seo 

GCC AGC CCC ACA GTG TCC TCC AGC AGC CTG GTG CTG TAC CAG AGC TCA 
Ala ser Pro Thr Val Ser Ser Ser Ser Leu Val Leu Tyr Gin Ser Ser 
585 590 595 goo 

GAC TCC AGC AAT GGC CAG AGC CAC CTG CTG CCA TCC AAC CAC AGC GTC 
Asp Ser Ser Asn Gly Gin Ser His Leu Leu Pro Ser Asn His Ser Val 
605 610 615 

ATC GAG ACC TTC ATC TCC ACC CAG ATG GCC TCT TCC TCC CAG 
He Glu Thr Phe He Ser Thr Gin Met Ala Ser Ser Ser Gin 
620 625 630 

TAACCACGGC ACCTGGGCCC TGGGGCCTGT ACTGCCTGCT TGGGGGGTGA TGAGGGCAGC 



1298 



1346 



1394 



1442 



1490 



1538 



1586 



1634 



1682 



1730 



1778 



1826 



1874 



1916 



1976 
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AGCCAGCCCT GCCTGGAGGA CCTGAGCCTG CCGAGCAAOC GTGGCCCTTC CTGGACAGCT 
GTGCCTCGCT CCCCACTCTG CTCTGATGCA TCAGAAAGGG AGGGCTCTGA GGCGCCCCAA 
CCCGTGGAGG CTGCTCGGGG TG C AC AGG AG GGGGTCGTGG AGAGCTAGGA GCAAAGCCTG 
TTCATGGCAG ATGTAGGAGG GACTGTCGCT GCTTCGTGGG ATACAGTCTT CTTACTTGGA 
ACTGAAGGGG GCGGCCTATG ACTTGGGCAC CCCCAGCCTG GGCCTATGGA GAGCCCTGGG 
ACCGCTACAC CACTCTGGCA GCCACACTTC TCAGGACACA GGCCTGTGTA GCTGTGACCT 
GCTGAGCTCT GAG AGG CCCT GGATCAGCGT GGCCTTGTTC TGTCACCAAT GTACCCACCG 
GGCCACTCCT TCCTGCCCCA ACTCCTTCCA GCTAGTGACC CACATGCCAT TTGTACTGAC 
CCCATCACCT ACTCACACAG GCATTTCCTG GGTGGCTACT CTGTGCCAGA GCCTGGGGCT 
CTAACTGCCT GAGCCCAGGG AGGCCGAAGC TAACAGGGAA GGCAGGCAGG GCTCTCCTGG 
TCTTCCCATC CCCAGCGATT CCCTCTCCCA GGCCCCATGA CCTCCAGCTT TCCTGTATTT 
CTTCCCAAGA GCATGATGCC TCTGAGGCCA GCCTGGCCTC CTGCCTCTAC TGGGAAGGCT 
ACTTCGGGGC TGGGAAGTCG TCCTTACTCC TGTGGGAGCC TCGCAACCCG TGCCAAGTCC 
AGGTCCTGGT GGGGCAGCTC CTCTGTCTCG AGCGCCCTGC AGACCCTGCC CTTGTTTGGG 
GC AGG AG TAG CTGAGCTCAC AAGG CAGCAA GGCCCGAGCA GCTGAGCAGG GCCGGGGAAC 
TGGCCAAGCT GAGGTGCCCA GGAGAAGAAA GAGGTGACCC CAGGGCACAG GAGCTACCTG 
TGTGGACAGG ACTAACACTC AGAAGCCTGG GTGCCTGGCT GGCTGAGGGC AGTTCGCAGC 
CACCCTGAGG AGTCTGAGGT CCTGAGCACT GCCAGGAGGG ACAAAGGAGC CTGTGAACCC 
AGGACAAGCA TGGTCCCACA TCCCTGGGCC TGCTGCTGAG AACCTGGCCT TCAGTGTACC 
GCGTCTACCC TGGGATTCAG GAAAAGGCCT GGGGTGACCC GGCACCCCCT GCAGCTTGTA 
GCCAGCCGGG GCGAGTGGCA CGTTTATTTA ACTTTTAGTA AAGTCAAGGA GAAATGCGGT 



GG 



(2) INFORMATION FOR SEQ ID NO: 4: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 630 amino acids 

(B) TYPE: amino acid 
(D) TOPOLOGY : 1 inea r 

(ii) MOLECULE TYPE: protein 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 4: 

Met val Ser Lys Leu Ser Gin Leu Gin Thr Glu Leu Leu Ala Ala Leu 



2036 

2096 

2156 

2216 

2276 

2336 

2396 

2456 

2516 

2576 

2636 

2696 

2756 

2816 

2876 

2936 

2996 

3056 

3116 

3176 

3236 

3238 
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170 

5 10 IS 

Leu Glu Ser Gly Leu Ser Lys Glu ^ ^ ^ ^ ^ ^ ^ ^ 

25 30 
Pro Gly Pro Tyr Leu Uu Ala Gly «„ Gly Pro ^ ^ Lys Qly ^ 

40 45 
ser Cys Gly Gly Gly Arg G1 Qlu Leu ^ ^ ^ ^ ^ ^ 

55 60 
Gly Glu Thr Arg Gly ser Glu AS p Glu Thr Asp Asp Asp Gly Glu Asp 

75 80 
Phe Thr Pro Pro lie Leu Lys Glu Leu Glu Asn Leu Ser Pro Glu Glu 

85 90 95 

Ala Ala His Gin Lys Ala Val Val Glu Thr Leu Leu Gin Glu Asp Pro 

105 110 
Trp Arg Val Ala Lys Met Val Lys Ser Tyr Leu Gin Gin His Asn He 

120 125 

Pro Gin Gin Glu Val Val Asp Thr Thr Gly Leu Asn Gin Ser His Leu 

135 14Q 

Ser Gin His Leu Asn Lys Gly Thr Pro Met Lys Thr Gin Lys Arg Ala 

155 160 
Ala Leu Tyr Thr Trp Tyr Val Arg Lys Gin Arg Glu Val Ala Gin Gin 

165 "0 17s 

Phe Thr His Ala Gly Gin Gly Gly Leu He Glu Glu Pro Thr Gly Asp 

185 190 
Glu Leu Pro Thr Lys Lys Gly Arg Arg Asn Arg Phe Lys Trp Gly Pro 

200 205 

Ala Ser Gin Gin He Leu Phe Gin Ala Tyr Glu Arg Gin Lys Asn Pro 

215 220 
Ser Lys Glu Glu Arg Glu Thr Leu Val Glu Glu Cys Asn Arg Ala Glu 

230 235 240 

Cys lie Gin Arg Gly Val Ser Pro Ser Gin Ala Gin Gly Leu Gly Ser 

245 250 255 

Asn Leu val Thr Glu Val Arg Val Tyr Asn Trp Phe Ala Asn Arg Arg 

265 270 
WS Glu Glu Ala Phe Arg His Lys Leu Ala Met Asp Thr Tyr Ser Gly 



280 285 



Pro Pro Pro Gly Pro Gly Pro Gly 



290 



295 



Pro Ala Leu Pro Ala His Ser Ser 



300 



Pro Gly Leu Pro Pro Pro Ala Leu Ser Pro Ser Lys Val His Gly val 
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305 



310 315 320 



Arg Gly Gin Pro Ala Thr Ser Glu Thr Ala Glu Val Pro Ser Ser Ser 
325 330 335 

Gly Gly Pro Leu Val Thr Val Ser Thr Pro Leu His Gin Val Ser Pro 
340 345 350 

Thr Gly Leu Glu Pro Ser His Ser Leu Leu Ser Thr Glu Ala Lys Leu 
355 360 365 

Val Ser Ala Ala Gly Gly Pro Leu Pro Pro Val Ser Thr Leu Thr Ala 
370 375 3B0 

Leu His Ser Leu Glu Gin Thr Ser Pro Gly Leu Asn Gin Gin Pro Gin 
385 390 395 400 

Asn Leu lie Met Ala Ser Leu Pro Gly Val Met Thr He Gly Pro Gly 
405 410 415 

Glu Pro Ala Ser Leu Gly Pro Thr Phe Thr Asn Thr Gly Ala Ser Thr 
420 425 430 

Leu Val He Gly Leu Ala Ser Thr Gin Ala Gin Ser Val Pro Val He 
435 440 445 

Asn Ser Met Gly Ser Ser Leu Thr Thr Leu Gin Pro Val Gin Phe Ser 
450 455 460 

Gin Pro Leu His Pro Ser Tyr Gin Gin Pro Leu Met Pro Pro Val Gin 
465 470 475 460 

Ser His Val Thr Gin Ser Pro Phe Met Ala Thr Met Ala Gin Leu Gin 
485 490 495 

Ser Pro His Ala Leu Tyr Ser His Lys Pro Glu Val Ala Gin Tyr Thr 
500 505 510 

His Thr Gly Leu Leu Pro Gin Thr Met Leu He Thr Asp Thr Thr Asn 
515 520 525 

Leu Ser Ala Leu Ala Ser Leu Thr Pro Thr Lys Gin Val Phe Thr Ser 
530 535 540 

Asp Thr Glu Ala Ser Ser Glu Ser Gly Leu His Thr Pro Ala Ser Gin 
545 550 555 560 

Ala Thr Thr Leu His Val Pro Ser Gin Asp Pro Ala Gly He Gin His 
565 570 575 

Leu Gin Pro Ala His Arg Leu Ser Ala Ser Pro Thr Val Ser Ser Ser 
580 585 590 

Ser Leu Val Leu Tyr Gin Ser Ser Asp Ser Ser Asn Gly Gin Ser His 
595 600 605 

Leu Leu Pro Ser Asn His Ser Val He Glu Thr Phe He Ser Thr Gin 
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610 615 



Met Ala Ser Ser Ser Gin 
625 630 



172 

620 



(2) INFORMATION FOR SEQ ID NO: 5: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 3239 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ix) FEATURE: 

(A) NAME/KEY: modif iedjbase 

(B) LOCATION: 989 

(D) OTHER INFORMATION : /mod_base= OTHER 
/note= «N = A, C, G, or T" 

(ix) FEATURE: 

(A) NAME /KEY : CDS 

(B) LOCATION: 24. .965 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 5: 

CGTGGCCCTG TGGCAGCCGA GCC ATG GTT TCT AAA CTG AGC CAG CTG CAG 

Met Val Ser Lys Leu Ser Gin Leu Gin 



1 



5 



ACG GAG CTC CTG GCG GCC CTG CTC GAG TCA GGG CTG AGC AAA GAG GCA 
Thr Glu Leu Leu Ala Ala Leu Leu Glu Ser Gly Leu Ser Lys Glu aS 

15 20 25 

CTG ATC CAG GCA CTG GGT GAG CCG GGG CCC TAC CTC CTG GCT GGA GAA 
Leu lie Gin Ala Leu Gly Glu Pro Gly Pro Tyr Leu Leu All Gly £J 
30 35 40 

GGC CCC CTG GAC AAG GGG GAG TCC TGC GGC GGC GGT CGA GGG GAG CTG 
Gly Pro Leu Asp Lys Gly Glu Ser Cys Gly Gly Gly Arg Gly Glu Leu 
45 50 55 

GCT GAG CTG CCC AAT GGG CTG GGG GAG ACT CGG GGC TCC GAG GAC GAG 
Ala Glu Leu Pro Asn Gly Leu Gly Glu Thr Arg Gly Ser Glu Asp Glu 
60 65 70 

ACG GAC GAC GAT GGG GAA GAC TTC ACG CCA CCC ATC CTC AAA GAG CTG 
Thr Asp Asp Asp Gly Glu Asp Phe Thr Pro Pro He Leu Lys Glu Leu 
75 80 85 

GAG AAC CTC AGC CCT GAG GAG GCG GCC CAC CAG AAA GCC GTG GTG GAG 
Glu Asn Leu Ser Pro Glu Glu Ala Ala His Gin Lys Ala Val Val Glu 

95 100 105 

ACC CTT CTG CAG GAG GAC CCG TGG CGT GTG GCG AAG ATG GTC AAG TCC 
Thr Leu Leu Gin Glu Asp Pro Trp Arg Val Ala Lys Met Val Lys Ser 
110 H5 120 



50 



98 



146 



194 



242 



290 



338 



386 



WO 98/11254 



173 



POT/US97/16037 



TAC CTG CAG CAG CAC AAC ATC CCA CAG CGG GAG GTG GTC GAT ACC ACT 434 
Tyr Leu Gin Gin His Asn He Pro Gin Arg Glu Val Val Asp Thr Thr 
125 130 135 

GGC CTC AAC CAG TCC CAC CTG TCC CAA CAC CTC AAC AAG GGC ACT CCC 482 
Gly Leu Asn Gin Ser His Leu Ser Gin His Leu Asn Lys Gly Thr Pro 
140 145 150 

ATG AAG ACG CAG AAG CGG GCC GCC CTG TAC ACC TGG TAC GTC CGC AAG 53 0 

Met Lys Thr Gin Lys Arg Ala Ala Leu Tyr Thr Trp Tyr Val Arg Lys 
155 160 165 

CAG CGA GAG GTG GCG CAG CAG TTC ACC CAT GCA GGG CAG GGA GGG CTG 578 
Gin Arg Glu Val Ala Gin Gin Phe Thr His Ala Gly Gin Gly Gly Leu 
170 175 180 185 

ATT GAA GAG CCC ACA GGT GAT GAG CTA CCA ACC AAG AAG GGG CGG AGG 626 
He Glu Glu Pro Thr Gly Asp Glu Leu Pro Thr Lys Lys Gly Arg Arg 
190 195 200 

AAC CGT TTC AAG TGG GGC CCA GCA TCC CAG CAG ATC CTG TTC CAG GCC 674 
Asn Arg Phe Lys Trp Gly Pro Ala Ser Gin Gin He Leu Phe Gin Ala 
205 210 215 

TAT GAG AGG CAG AAG AAC CCT AGC AAG GAG GAG CGA GAG ACG CTA GTG 722 
Tyr Glu Arg Gin Lys Asn Pro Ser Lys Glu Glu Arg Glu Thr Leu Val 
220 225 230 

GAG GAG TGC AAT AGG GCG GAA TGC ATC CAG AGA GGG GTG TCC CCA TCA 770 
Glu Glu Cys Asn Arg Ala Glu Cys He Gin Arg Gly Val Ser Pro Ser 
235 240 245 

CAG GCA CAG GGG CTG GGC TCC AAC CTC GTC ACG GAG GTG CGT GTC TAC 818 
Gin Ala Gin Gly Leu Gly Ser Asn Leu Val Thr Glu Val Arg Val Tyr 
250 255 260 265 

AAC TGG TTT GCC AAC CGG CGC AAA GAA GAA GCC TTC CGG CAC AAG CTG 866 
Asn Trp Phe Ala Asn Arg Arg Lys Glu Glu Ala Phe Arg His Lys Leu 
270 275 280 

GCC ATG GAC ACG TAC AGC GGG CCC CCC CCC AGG GCC AGG CCC GGG ACC 914 
Ala Met Asp Thr Tyr Ser Gly Pro Pro Pro Arg Ala Arg Pro Gly Thr 
285 290 295 

TGC GCT GCC CGC TCA CAG CTC CCC TGG CCT GCC TCC ACC TGC CCT CTC 962 
Cys Ala Ala Arg Ser Gin Leu Pro Trp Pro Ala Ser Thr Cys Pro Leu 
300 305 310 

CCC CAGTAAGGTC CACGGTGTGC GCTNTGGACA GCCTGCGACC AGTGAGACTG 1015 
Pro 



CAGAAGTACC CTCAAGCAGC GGCGGTCCCT TAGTGACAGT GTCTACACCC CTCCACCAAG 
TGTCCCCCAC GGGCCTGGAG CCCAGCCACA GCCTGCTGAG TACAGAAGCC AAGCTGGTCT 



1075 
1135 
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174 

CAG CAGCTGG GGGCCCCCTC CCCCCTGTCA GCACCCTGAC AG C ACTG C AC AGCTTGGAGC 
AGACATCCCC AGGCCTCAAC CAGCAGCCCC AGAACCTCAT CATGGCCTCA CTTCCTGGGG 
TCATGACCAT CGGGCCTGGT GAGCCTGCCT CCCTGGGTCC TACGTTCACC AACACAGGTG 
CCTCCACCCT GGTCATCGGC CTGGCCTCCA CGCAGGCACA GAGTGTGCCG GTCATCAACA 
GCATGGGCAG CAGCCTGACC ACCCTGCAGC CCGTCCAGTT CTCCCAGCCG CTGCACCCCT 
CCTACCAGCA GCCGCTCATG CCACCTGTGC AGAGCCATGT GACCCAGAGC CCCTTCATGG 
CCACCATGGC TCAGCTGCAG AGCCCCCACG CCCTCTACAG CCACAAGCCC GAGGTGGCCC 
AGTACACCCA CACGGGCCTG CTCCCGCAGA CTATGCTCAT CACCGACACC ACCAACCTGA 
GCGCCCTGGC CAGCCTCACG CCCACCAAGC AGGTCTTCAC CTCAGACACT GAGGCCTCCA 
GTGAGTCCGG GCTTCACACG CCGGCATCTC AGGCCACCAC CCTCCACGTC CCCAGCCAGG 
ACCCTGCCGG CATCCAGCAC CTGCAGCCGG CCCACCGGCT CAGCGCCAGC CCCACAGTGT 
CCTCCAGCAG CCTGGTGCTG TACCAGAGCT CAGACTCCAG CAATGGCCAG AGCCACCTGC 
TGCCATCCAA CCACAGCGTC ATCGAGACCT TCATCTCCAC CCAGATGGCC TCTTCCTCCC 
AGTAACCACG GCACCTGGGC CCTGGGGCCT GTACTGCCTG CTTGGGGGGT GATGAGGGCA 
GCAGCCAGCC CTGCCTGGAG GACCTGAGCC TGCCGAGCAA CCGTGGCCCT TCCTGGACAG 
CTGTGCCTCG CTCCCCACTC TGCTCTGATG CATCAGAAAG GGAGGGCTCT GAGGCGCCCC 
AACCCGTGGA GGCTGCTCGG GGTGCACAGG AGGGGGTCGT GGAGAGCTAG GAGCAAAGCC 
TGTTCATGGC AGATGTAGGA GGGACTGTCG CTGCTTCGTG GGATACAGTC TTCTTACTTG 
GAACTGAAGG GGGCGGCCTA TGACTTGGGC ACCCCCAGCC TGGGCCTATG GAGAGCCCTG 
GGACCGCTAC ACCACTCTGG CAGCCACACT TCTCAGGACA CAGGCCTGTG TAG CTGTGAC 
CTGCTGAGCT CTGAGAGGCC CTGGAT CAGC GTGGCCTTGT TCTGTCACCA ATGTACCCAC 
CGGGCCACTC CTTCCTGCCC CAACTCCTTC CAGCTAGTGA CCCACATGCC ATTTGTACTG 
ACCCCATCAC CTACTCACAC AGGCATTTCC TGGGTGGCTA CTCTGTGCCA GAGCCTGGGG 
CTCTAACTGC CTGAGCCCAG GGAGGCCGAA GCTAACAGGG AAGGCAGGCA GGGCTCTCCT 
GGTCTTCCCA TCCCCAGCGA TTCCCTCTCC CAGGCCCCAT GACCTCCAGC TTTCCTGTAT 
TTCTTCCCAA GAGCATGATG CCTCTGAGGC CAGCCTGGCC TCCTGCCTCT ACTGGGAAGG 
CTACTTCGGG GCTGGGAAGT CGTCCTTACT CCTGTGGGAG CCTCGCAACC CGTGCCAAGT 
CCAGGTCCTG GTGGGGCAGC TCCTCTGTCT CGAGCGCCCT GCAGACCCTG CCCTTGTTTG 
GGGCAGGAGT AG CTG AGCTC ACAAGGCAGC AAGGCCCGAG CAGCTGAGCA GGGCCGGGGA 



1195 
1255 
1315 
1375 
1435 
1495 
1555 
1615 
1675 
1735 
1795 
1855 
1915 
1975 
2035 
2095 
2155 
2215 
2275 
2335 
2395 
2455 
2515 
2575 
2635 
2695 
2755 
2815 
2875 
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175 

ACTGGCCAAG CTGAGGTGCC CAGGAGAAGA AAGAGGTGAC CCCAGGGCAC AGGAGCTACC 293 5 

TGTGTGGACA GGACTAACAC TCAGAAGCCT GGGTGCCTGG CTGGCTGAGG GCAGTTCGCA 2995 

GCCACCCTGA GGAGTCTGAG GTCCTGAGCA CTGCCAGGAG GGACAAAGGA GCCTGTGAAC 3055 

CCAGGACAAG CATGGTCCCA CATCCCTGGG CCTGCTGCTG AGAACCTGGC CTTCAGTGTA 3115 

CCGCGTCTAC CCTGGGATTC AGGAAAAGGC CTGGGGTGAC CCGGCACCCC CTGCAGCTTG 3175 

TAGCCAGCCG GGGCGAGTGG CACGTTTATT TAACTTTTAG TAAAGTCAAG GAGAAATGCG 323 5 
GTGA 

(2) INFORMATION FOR SEQ ID NO: 6: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 314 amino acids 

(B) TYPE: amino acid 
(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 6: 

Met Val Ser Lys Leu Ser Gin Leu Gin Thr Glu Leu Leu Ala Ala Leu 
15 10 15 

Leu Glu Ser Gly Leu Ser Lys Glu Ala Leu He Gin Ala Leu Gly Glu 
20 25 30 

Pro Gly Pro Tyr Leu Leu Ala Gly Glu Gly Pro Leu Asp Lys Gly Glu 
35 40 45 

Ser Cys Gly Gly Gly Arg Gly Glu Leu Ala Glu Leu Pro Asn Gly Leu 
50 55 60 

Gly Glu Thr Arg Gly Ser Glu Asp Glu Thr Asp Asp Asp Gly Glu Asp 
65 70 75 80 

Phe Thr Pro Pro He Leu Lys Glu Leu Glu Asn Leu Ser Pro Glu Glu 
85 90 95 

Ala Ala His Gin Lys Ala Val Val Glu Thr Leu Leu Gin Glu Asp Pro 
100 105 HO 

Trp Arg Val Ala Lys Met Val Lys Ser Tyr Leu Gin Gin His Asn He 
115 120 125 

Pro Gin Arg Glu Val Val Asp Thr Thr Gly Leu Asn Gin Ser His Leu 
130 135 140 

Ser Gin His Leu Asn Lys Gly Thr Pro Met Lys Thr Gin Lys Arg Ala 
145 150 " 155 160 

Ala Leu Tyr Thr Trp Tyr Val Arg Lys Gin Arg Glu Val Ala Gin Gin 



WO 98/11254 



176 



PCT/US97/H5037 



165 170 



175 



Phe Thr His Ala Gly Gin Gly Gly Leu lie Glu Glu Pro Thr Gly Asp 
180 165 190 

Glu Leu Pro Thr Lys Lys Gly Arg Arg Asn Arg Phe Lys Trp Gly Pro 
195 200 2 05 



Ala Ser Gin Gin He Leu Phe Gin Ala 
21 ° 215 



Tyr Glu Arg Gin Lys Asn Pro 
220 



Ser Lys Glu Glu Arg Glu Thr Leu Val Glu Glu Cys Asn Arg Ala Glu 
225 230 235 240 

Cys He Gin Arg Gly Val Ser Pro Ser Gin Ala Gin Gly Leu Gly Ser 
245 250 255 

Asn Leu Val Thr Glu Val Arg Val Tyr Asn Trp Phe Ala Asn Arg Ara 
260 265 2 7o 

Lys Glu Glu Ala Phe Arg His Lys Leu Ala Met Asp Thr Tyr Ser Gly 
275 280 



285 



Pro Pro Pro Arg Ala Arg Pro Gly Thr Cys Ala Ala Arg Ser Gin Leu 

3 00 



290 295 



Pro Trp Pro Ala Ser Thr Cys Pro Leu Pro 



(2) INFORMATION FOR SEQ ID NO: 7: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 3236 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ix) FEATURE: 

(A) NAME/KEY: modif ied_base 

(B) LOCATION: 988 

(D) OTHER INFORMATION :/mod_base= OTHER 
/note= "N s A, C, G, or T" 

(ix) FEATURE : 

(A) NAME /KEY : CDS 

(B) LOCATION: join (24. .986, 990.. 1271) 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 7: 

CGTGGCCCTG TGGCAGCCGA GCC ATG GTT TCT AAA CTG AGC CAG CTG CAG 

Met Val Ser Lys Leu Ser Gin Leu Gin 
1 5 

ACQ GAG CTC CTG GCG GCC CTG CTC GAG TCA GGG CTG AGC AAA GAG GCA 
Tftr Glu Leu Leu Ala Ala Leu Leu Glu Ser Gly Leu Ser Lys Glu Ala 
10 15 20 25 



50 



98 
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CTG ATC CAG GCA CTG GGT GAG CCG GGG CCC TAC CTC CTG GCT GGA GAA 146 
Leu lie Gin Ala Leu Gly Glu Pro Gly Pro Tyr Leu Leu Ala Gly Glu 
30 35 40 

GGC CCC CTG GAC AAG GGG GAG TCC TGC GGC GGC GGT CGA GGG GAG CTG 194 
Gly Pro Leu Asp Lys Gly Glu Ser Cys Gly Gly Gly Arg Gly Glu Leu 
45 50 55 

GCT GAG CTG CCC AAT GGG CTG GGG GAG ACT CGG GGC TCC GAG GAC GAG 242 
Ala Glu Leu Pro Asn Gly Leu Gly Glu Thr Arg Gly Ser Glu Asp Glu 

65 70 



60 



ACG GAC GAC GAT GGG GAA GAC TTC ACG CCA CCC ATC CTC AAA GAG CTG 
Thr Asp Asp Asp Gly Glu Asp Phe Thr Pro Pro lie Leu Lys Glu Leu 

80 85 



155 



160 165 



CAG CGA GAG GTG GCG CAG CAG TTC ACC CAT GCA GGG CAG GGA GGG CTG 
Gin Arg Glu Val Ala Gin Gin Phe Thr His Ala Gly Gin Gly Gly Leu 

175 180 185 



170 



ATT GAA GAG CCC ACA GGT GAT GAG CTA CCA ACC AAG AAG GGG CGG AGG 
He Glu Glu Pro Thr Gly Asp Glu Leu Pro Thr Lys Lys Gly Arg Arg 

195 200 



190 



AAC CGT TTC AAG TGG GGC CCA GCA TCC CAG CAG ATC CTG TTC CAG GCC 
Asn Arg Phe Lys Trp Gly Pro Ala Ser Gin Gin He Leu Phe Gin Ala 

210 215 



205 



290 



75 

GAG AAC CTC AGC CCT GAG GAG GCG GCC CAC CAG AAA GCC GTG GTG GAG 338 

Glu Asn Leu Ser Pro Glu Glu Ala Ala His Gin Lys Ala Val Val Glu 
90 95 100 105 

ACC CTT CTG CAG GAG GAC CCG TGG CGT GTG GCG AAG ATG GTC AAG TCC 
Thr Leu Leu Gin Glu Asp Pro Trp Arg Val Ala Lys Met Val Lys Ser 
110 H5 120 

TAC CTG CAG CAG CAC AAC ATC CCA CAG CGG GAG GTG GTC GAT ACC ACT 
Tyr Leu Gin Gin His Asn He Pro Gin Arg Glu Val Val Asp Thr Thr 
125 130 135 

GGC CTC AAC CAG TCC CAC CTG TCC CAA CAC CTC AAC AAG GGC ACT CCC 
Gly Leu Asn Gin Ser His Leu Ser Gin His Leu Asn Lys Gly Thr Pro 
140 145 ISO 

ATG AAG ACG CAG AAG CGG GCC GCC CTG TAC ACC TGG TAC GTC CGC AAG 530 
Met Lys Thr Gin Lys Arg Ala Ala Leu Tyr Thr Trp Tyr Val Arg Lys 



386 



434 



482 



578 



626 



674 



722 



TAT GAG AGG CAG AAG AAC CCT AGC AAG GAG GAG CGA GAG ACG CTA GTG 
Tyr Glu Arg Gin Lys Asn Pro Ser Lys Glu Glu Arg Glu Thr Leu Val 
220 225 230 

GAG GAG TGC AAT AGG GCG GAA TGC ATC CAG AGA GGG GTG TCC CCA TCA 770 
Glu Glu Cys Asn Arg Ala Glu Cys He Gin Arg Gly Val Ser Pro Ser 
235 240 245 
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CAG GCA CAG GGG CTG GGC TCC AAC CTC GTC ACG GAG GTG CGT GTC TAC 
Gin Ala Gin Gly Leu Gly Ser Asn Leu Val TLr Glu Val Arg Val Tyr 

255 260 26 5 

AAC TGG TTT GCC AAC CGG CGC AAA GAA GAA GCC TTC CGG CAC AAG CTG 
Asn Trp Phe Ala Asn Arg Arg Lys Glu Glu Ala Phe Arg His Lys Leu 
270 275 280 

GCC ATG GAC ACG TAC AGC GGG CCC CCC CCA GGG CCA GGC CCG GGA CCT 
Ala Met Asp Thr Tyr Ser Gly Pro Pro Pro Gly Pro Gly Pro Gly Pro 
285 290 295 

GCG CTG CCC GCT CAC AGC TCC CCT GGC CTG CCT CCA CCT GCC CTC TCC 
Ala Leu Pro Ala His Ser Ser Pro Gly Leu Pro Pro Pro Ala Leu Ser 
300 305 310 

CCC AGT AAG GTC CAC GGT GTG CGC TNT GGA CAG CCT GCG ACC ACT GAG 
Pro Ser Lys Val His Gly Val Arg Gly Gin Pro Ala Thr Ser £S 

315 3 20 325 

ACT GCA GAA GTA CCC TCA AGC AGC GGC GGT CCC TTA GTG ACA GTG TCT 
Thr Ala Glu Val Pro Ser Ser Ser Gly Gly Pro Leu Val Thr VaT IZ 
330 335 340 

ACA CCC CTC CAC CAA GTG TCC CCC ACG GGC CTG GAG CCC AGC CAC AGC 
Thr Pro Leu His Gin Val Ser Pro Thr Gly Leu Glu Pro Ser His Jer 

350 355 360 

CTG CTG AGT ACA GAA GCC AAG CTG GTC TCA GCA GCT GGG GGC CCC CTC 
Leu Leu Ser Thr Glu Ala Lys Leu Val Ser Ala Ala Gly Gly Pro Leu 
365 3 7 o 375 

CCC CGT CAG CAC CCT GAC AGC ACT GCA CAG CTT GGA GCA GAC ATC CCC 
Pro Arg Gin His Pro Asp Ser Thr Ala Gin Leu Gly Ala Asp He Pro 
380 385 39Q 

AGG CCT CAA CCA GCA GCC CCA GAA CCT CAT CAT GGC CTC ACT TCC TGG 
Arg Pro Gin Pro Ala Ala Pro Glu Pro His His Gly Leu Thr Ser Trp 
395 400 405 

GGT CAT GAC CAT CGG GCC TGG TGAGCCTGCC TCCCTGGGTC CTACGTTCAC 
Gly His Asp His Arg Ala Trp 
410 ~ 415 

CAACACAGGT GCCTCCACCC TGGTCATCGG CCTGGCCTCC ACGCAGGCAC AGAGTGTGCC 
GGTCATCAAC AGCATGGGCA GCAGCCTGAC CACCCTGCAG CCCGTCCAGT TCTCCCAGCC 
GCTGCACCCC TCCTACCAGC AGCCGCTCAT GCCACCTGTG CAGAGCCATG TGACCCAGAG 
CCCCTTCATG GCCACCATGG CTCAGCTGCA GAGCCCCCAC GCCCTCTACA GCCACAAGCC 
CGAGGTGGCC CAGTACACCC ACACGGGCCT GCTCCCGCAG ACTATGCTCA TCACCGACAC 
CACCAACCTG AGCGCCCTGG CCAGCCTCAC GCCCACCAAG CAGGTCTTCA CCTCAGACAC 
TGAGGCCTCC AGTGAGTCCG GGCTTCACAC GCCGGCATCT CAGGCCACCA CCCTCCACGT 



eis 

866 

914 

962 

1010 

1058 

1106 

1154 

1202 

1250 

1301 

1361 
1421 
1481 
1541 
1601 
1661 
1721 
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CCCCAGCCAG GACCCTGCCG 
CCCCACAGTG TCCTCCAGCA 
GAGCCACCTG CTGCCATCCA 
CTCTTCCTCC CAGTAACCAC 
TGATGAGGGC AGCAGCCAGC 
TTCCTGGACA GCTGTGCCTC 
TGAGGCGCCC CAACCCGTGG 
GG AGCAAAG C CTGTTCATGG 
CTTCTTACTT GGAACTGAAG 
GGAGAGCCCT GGGACCGCTA 
GTAGCTGTGA CCTGCTGAGC 
AATGTACCCA CCGGGCCACT 
CATTTGTACT GACCCCATCA 
AGAGCCTGGG GCTCTAACTG 
AGGGCTCTCC TGGTCTTCCC 
CTTTCCTGTA TTTCTTCCCA 
TACTGGGAAG GCTACTTCGG 
CCGTGCCAAG TCCAGGTCCT 
GCCCTTGTTT GGGGCAGGAG 
AGGGCCGGGG AACTGGCCAA 
CAGGAGCTAC CTGTGTGGAC 
GGCAGTTCGC AGCCACCCTG 
AGCCTGTGAA CCCAGGACAA 
CCTTCAGTGT ACCGCGTCTA 
CCTGCAGCTT GTAGCCAGCC 
GGAGAAATGC GGTGG 

(2) INFORMATION FOR SEQ ID NO: 8: 

(i) SEQUENCE CHARACTERISTICS: 



TCAGCGCCAG 1781 

GCAATGGCCA 1841 

CCCAGATGGC 1901 

GCTTGGGGGG 1961 

ACCGTGGCCC 2021 

GGGAGGGCTC 2081 

TGGAGAGCTA 2141 

GGGATACAGT 2201 

CTGGGCCTAT 2261 

AC AGG CCTGT 2 321 

TTCTGTCACC 2381 

ACCCACATGC 2441 

ACTCTGTGCC 2 501 

GAAGGCAGGC 2561 

TGACCTCCAG 2621 

CTCCTGCCTC 2681 

GCCTCGCAAC 2741 

TGCAGACCCT 2 801 

GCAGCTGAGC 2 861 

CCCCAGGGCA 2 921 

GCTGGCTGAG 2 981 

GGGACAAAGG 3 041 

GAGAACCTGG 3101 

CCCGGCACCC 3161 

GTAAAGTCAA 3221 
3236 



GCATCCAGCA 
GCCTGGTGCT 
ACCACAGCGT 
GGCACCTGGG 
CCTGCCTGGA 
GCTCCCCACT 
AGGCTGCTCG 
CAGATGTAGG 
GGGGCGGCCT 
CACCACTCTG 
TCTGAGAGGC 
CCTTCCTGCC 
CCTACTCACA 
CCTGAGCCCA 
ATCCCCAGCG 
AGAGCATGAT 
GGCTGGGAAG 
GGTGGGGCAG 
TAGCTGAGCT 
GCTGAGGTGC 
AGGACTAACA 
AGGAGTCTGA 
GCATGGTCCC 
CCCTGGGATT 
GGGGCGAGTG 



CCTGCAGCCG 
GTACCAGAGC 
CATCGAGACC 
CCCTGGGGCC 
GGACCTGAGC 
CTGCTCTGAT 
GGGTGCACAG 
AGGGACTGTC 
ATG ACTTGGG 
GCAGCCACAC 
CCTGGATCAG 
CCAACTCCTT 
CAGGCATTTC 
GGGAGGCCGA 
ATTCCCTCTC 
GCCTCTGAGG 
TCGTCCTTAC 
CTCCTCTGTC 
CACAAGGCAG 
CCAGGAGAAG 
CTCAGAAGCC 
GGTCCTGAGC 
ACATCCCTGG 
CAGGAAAAGG 
GCACGTTTAT 



GCCCACOGGC 
TCAGACTCCA 
TTCATCTCCA 
TGTACTGCCT 
CTGCCGAGCA 
GCATCAGAAA 
GAGGGGGTCG 
GCTGCTTCGT 
CACCCCCAGC 
TTCTCAGGAC 
CGTGGCCTTG 
CCAGCTAGTG 
CTGGGTGGCT 
AGCTAACAGG 
CCAGGCCCCA 
CCAGCCTGGC 
TCCTGTGGGA 
TCGAGCGCCC 
CAAGGCCCGA 
AAAGAGGTGA 
TGGGTGCCTG 
ACTGCCAGGA 
GCCTGCTGCT 
CCTGGGGTGA 
TTAACTTTTA 
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(A) LENGTH: 415 amino acids 

(B) TYPE: amino acid 
(D) TOPOLOGY: linear 



(ii) MOLECULE TYPE: protein 

(Xi) SEQUENCE DESCRIPTION: SEQ ID NO : 8: 

Met Val Ser Lys Leu Ser Gin Leu Gin Thr Glu Leu Leu Ala Ala Leu 
1 5 10 15 

Leu Glu Ser Gly Leu Ser Lys Glu Ala Leu lie Gin Ala Leu Gly Glu 
20 25 30 

Pro Gly Pro Tyr Leu Leu Ala Gly Glu Gly Pro Leu Asp Lys Gly Glu 
35 40 45 

Ser Cys Gly Gly Gly Arg Gly Glu Leu Ala Glu Leu Pro Asn Gly Leu 
50 55 60 

Gly Glu Thr Arg Gly Ser Glu Asp Glu Thr Asp Asp Asp Gly Glu Asp 
65 70 75 80 

Phe Thr Pro Pro lie Leu Lys Glu Leu Glu Asn Leu Ser Pro Glu Glu 
85 90 95 

Ala Ala His Gin Lys Ala Val Val Glu Thr Leu Leu Gin Glu Asp Pro 
100 105 no 

Trp Arg Val Ala Lys Met Val Lys Ser Tyr Leu Gin Gin His Asn lie 
115 120 125 

Pro Gin Arg Glu Val Val Asp Thr Thr Gly Leu Asn Gin Ser His Leu 
130 135 140 

Ser Gin His Leu Asn Lys Gly Thr Pro Met Lys Thr Gin Lys Arg Ala 
!45 150 155 160 

Ala Leu Tyr Thr Trp Tyr Val Arg Lys Gin Arg Glu Val Ala Gin Gin 
165 170 175 

Phe Thr His Ala Gly Gin Gly Gly Leu He Glu Glu Pro Thr Gly Asp 
180 185 190 

Glu Leu Pro Thr Lys Lys Gly Arg Arg Asn Arg Phe Lys Trp Gly Pro 
195 200 205 

Ala Ser Gin Gin He Leu Phe Gin Ala Tyr Glu Arg Gin Lys Asn Pro 
210 215 220 



Ser Lys Glu Glu Arg Glu Thr Leu 
225 230 

Cys He Gin Arg Gly Val Ser Pro 
245 

Asn Leu Val Thr Glu Val Arg Val 
260 



Val Glu Glu Cys Asn Arg Ala Glu 
235 240 

Ser Gin Ala Gin Gly Leu Gly Ser 
250 255 

Tyr Asn Trp Phe Ala Asn Arg Arg 
26 5 270 
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Lys Glu Glu Ala 
275 

Pro Pro Pro Gly 
290 

Pro Gly Leu Pro 
305 

Arg Gly Gin Pro 



Gly Gly Pro Leu 
340 

Thr Gly Leu Glu 
355 

Val Ser Ala Ala 
370 

Ala Gin Leu Gly 
385 

Pro His His Gly 



Phe Arg His Lys 
280 

Pro Gly Pro Gly 
295 

Pro Pro Ala Leu 
310 

Ala Thr Ser Glu 
325 

Val Thr Val Ser 



Pro Ser His Ser 
360 

Gly Gly Pro Leu 
375 

Ala Asp lie Pro 
390 

Leu Thr Ser Trp 
405 



Leu Ala Met Asp 



Pro Ala Leu Pro 
300 

Ser Pro Ser Lys 
315 

Thr Ala Glu Val 
330 

Thr Pro Leu His 
345 

Leu Leu Ser Thr 



Pro Arg Gin His 
380 

Arg Pro Gin Pro 
3 95 

Gly His Asp His 
410 



Thr Tyr Ser Gly 
285 

Ala His Ser Ser 



Val His Gly Val 
320 

Pro Ser Ser Ser 
335 

Gin Val Ser Pro 
350 

Glu Ala Lys Leu 
365 

Pro Asp Ser Thr 



Ala Ala Pro Glu 
400 

Arg Ala Trp 
415 



(2) INFORMATION FOR SEQ ID NO: 9: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 13 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ix) FEATURE: 

(A) NAME /KEY : modif ied_base 

(B) LOCATION: 7 

(D) OTHER INFORMATION :/mod_base= OTHER 
/note= "N = A, C, G # or T" 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 9: 
GTTAATNATT ACC 



(2) INFORMATION FOR SEQ ID NO: 10: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 23 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(Xi) SEQUENCE DESCRIPTION: SEQ ID NO: 10: 
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TACACCACTC TGGCAG CCAC ACT 

23 

(2) INFORMATION FOR SEQ ID NO: 11; 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 24 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(Xi) SEQUENCE DESCRIPTION: SEQ ID NO: 11: 
CGGTGGGTAC ATTGGTGACA GAAC 

24 

(2) INFORMATION FOR SEQ ID NO: 12: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 21 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

<D) TOPOLOGY: linear - 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 12: 
GGCAGGCAAA CGCAACCCAC G 

21 

(2) INFORMATION FOR SEQ ID NO: 13: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 21 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(Xi) SEQUENCE DESCRIPTION: SEQ ID NO : 13: 
GAAGGGGGGC TCGTTAGGAG C 

21 

(2) INFORMATION FOR SEQ ID NO: 14: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 21 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(Xi) SEQUENCE DESCRIPTION: SEQ ID NO: 14; 
CATGCACAGT CCCCACCCTC A 

21 



(2) INFORMATION FOR SEQ ID NO: 15: 



WO 98/11254 



183 



PCT/US97/16037 



(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 21 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

( D ) TOPOLOGY : 1 inear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 15: 
CTTCCAGCCC CCACCTATGA G 



(2) INFORMATION FOR SEQ ID NO: 16: 

(i) SEQUENCE CHARACTERISTICS: 
(A) LENGTH: 21 base pairs 
<B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 16: 
GGGCAAGGTC AGGGGAATGG A 



(2) INFORMATION FOR SEQ ID NO: 17: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 21 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 
<D) TOPOLOGY: linear 

(Xi) SEQUENCE DESCRIPTION: SEQ ID NO: 17: 
CAGCCCAGAC CAAACCAGCA C 



(2) INFORMATION FOR SEQ ID NO: 18: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 21 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

( D ) TOPOLOGY : 1 inear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 18: 
CAGAACCCTC CCCTTCATGC C 



(2) INFORMATION FOR SEQ ID NO: 19: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 21 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 
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(D) TOPOLOGY: linear 
(Xi) SEQUENCE DESCRIPTION: SEQ ID NO: 1 
GGTGACTGCT GTCAATGGGA C 



(2} INFORMATION FOR SEQ ID NO: 20: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 21 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(Xi) SEQUENCE DESCRIPTION: SEQ ID NO: 2 
GGCAGACAGG CAGATGGCCT A 



(2) INFORMATION FOR SEQ ID NO : 21: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 21 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 21 
GCCTCCCTAG GGACTG CTCC A 



(2) INFORMATION FOR SEQ ID NO: 22: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 21 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(Xi) SEQUENCE DESCRIPTION: SEQ ID NO: 22: 
TGGAGCAGTC CCTAGGGAGG C 



(2) INFORMATION FOR SEQ ID NO: 23: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 21 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 23: 
GTTGCCCCAT GAGCCTCCCA C 
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(2) INFORMATION FOR SEQ ID NO: 24: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 21 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 24: 
GGTCTTGGGC AGGGGTGGGA T 



(2) INFORMATION FOR SEQ ID NO : 25: 

<i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 21 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

( D ) TOPOLOGY : 1 inear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 25: 
CTGCAATGCC TGCCAGGCAC C 



(2) INFORMATION FOR SEQ ID NO : 26: 

<i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 21 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

<xi) SEQUENCE DESCRIPTION: SEQ ID NO: 26: 
CCCCTGCATC CATTGACAGC C 



(2) INFORMATION FOR SEQ ID NO: 27: 

(i) SEQUENCE CHARACTERISTICS : 

(A) LENGTH: 21 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

( D ) TOPOLOGY : 1 inear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 27: 
GAGGCCTGGG ACTAGGGCTG T 



(2) INFORMATION FOR SEQ ID NO : 28: 
(i) SEQUENCE CHARACTERISTICS: 
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(A) LENGTH: 21 base pairs 

(B) TYPE: nucleic acid 

(C) .STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(Xi) SEQUENCE DESCRIPTION: SEQ ID NO: 28 
CTCTGTCACA GGCCGAGGGA G 



(2) INFORMATION FOR SEQ ID NO : 29; 

(i) SEQUENCE CHARACTERISTICS; 

(A) LENGTH: 21 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 29: 
CCTGTGACAG AGCCCCTCAC C 



(2) INFORMATION FOR SEQ ID NO: 30: 

(i> SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 21 base pairs 

(B) TYPE: nucleic acid 
(C> STRANDEDNESS: single 
(D) TOPOLOGY: linear 

(Xi) SEQUENCE DESCRIPTION: SEQ ID NO : 30: 
CGGACAGCAA CAGAAGGGGT G 



(2) INFORMATION FOR SEQ ID NO: 31: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 21 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(Xi) SEQUENCE DESCRIPTION: SEQ ID NO: 31: 
CAGAGCCCCT CACCCCCACA T 



(2) INFORMATION FOR SEQ ID NO: 32: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 21 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 32: 
GTACCCCTAG GGACAGGCAG G 21 

(2) INFORMATION FOR SEQ ID NO: 33: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 21 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

( D ) TOPOLOGY : 1 inear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 33: 
ACCCCCCAAG CAGGCAGTAC A 21 

(2) INFORMATION FOR SEQ ID NO: 34: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 671 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ix) FEATURE: 

(A) NAME/KEY: CDS 

(B) LOCATION : 104 . .217 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 34: 
GCAGAGAGGG CACTGGGAGG AGGCAGTGGG AGGGCGGAGG GCGGGGGCCT TCGGGGTGGG 6 0 

CGCCCAGGGT AGGGCAGGTG GCCGCGGCGT GGAGGCAGGG AGA ATG CGA CTC TCC 115 

Met Arg Leu Ser 
1 

AAA ACC CTC GTC GAC ATG GAC ATG GCC GAC TAC AGT GCT GCA CTG GAC 163 
Lys Thr Leu Val Asp Met Asp Met Ala Asp Tyr Ser Ala Ala Leu Asp 
5 10 15 20 

CCA GCC TAC ACC ACC CTG GAA TTT GAG AAT GTG CAG GTG TTG ACG ATG 211 
Pro Ala Tyr Thr Thr Leu Glu Phe Glu Asn Val Gin Val Leu Thr Met 
25 30 35 

GGC AAT GGTAGGTGGG GGCAGATGTG CCCAGGTGTG CCAGTGGGGG CAGGTGTGCC 267 
Gly Asn 

TGGGTCCAGG AGCAGATCTT TGGCACTCAA CTTTGGGGTG GGAGGAGAAT GATACAAAAT 327 

GGTAGGTTGG TCCTACAGGC CAGCACAGGT GTTGCCAAGT GAAGCCCATG TGCCCAGGCA 387 

CAGTGATCAC AGGCATTCTG GGTGAAGGGA GGCCTGCAAG GGCCAATTTC CAGCAAAAGT 44 7 

CGATCCCGGC TATTCCTCCC AGGCCCTTCC AGTCCTCACT GCCTCACAGT GGCTCTGCTT 507 
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GGCGCTTV3GC ACAGTGACAT GATGGTGAGC TCCCCCTTGG TGCCCAGCrC CAGCGATTCA 
GCCCAGCACG GCCCCTTCGT GAACCCCTTG GGCCTAGGTT CAGAGAGACG GCAAGGGATG 
TTGTATCCCT GGAGATGGTG GTTGGAGACA TAACCGCATT TCTC 

(2) INFORMATION FOR SEQ ID NO: 35: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 38 amino acids 

(B) TYPE: amino acid 
(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 35: 

Met Arg Leu Ser Lys Thr Leu Val Asp Met Asp Met Ala Asp Tyr Ser 
5 10 15 

Ala Ala Leu Asp Pro Ala Tyr Thr Thr Leu Glu Phe Glu Asn Val Gin 
20 25 



30 

Val Leu Thr Met Gly Asn 



35 



567 
627 
671 



(2) INFORMATION FOR SEQ ID NO: 36: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 796 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ix) FEATURE: 

(A) NAME /KEY : CDS 

(B) LOCATION : j oin (286. .3 12 , 316.. 375) 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 36: 

TGGATGTTTG TACATGTGTG CTGTGTGTG C GGGTCATAGA GCACATGTGT TTGTGCATGC 

GGACCTGTTG GAGTGCCCTG TTCTTCCTGC ATCTTTATCC TGTATGGGCG TTTTGTCGTG 

TGCCCATATT TGTACCTGCT GTGTATATAT GCAGTTCCCT GTGCTGCGGG CGGGGGTCAG 

CGGTCTCTGG TGTGCACGAC TGCACAGACC CAAATGCAGG ACTCTGTTGT TGCCACTCAC 

CAAGTGAGAT TCATATCAGC AACATGTCCG TTTGTCTCTG AGCAG ATT TGT TGC 

lie Cys Cys 
1 

CGC TGC GTC TCG CCA GAT TGA GGC ATC CCC TCC GAC ATC ACT GGA GCA 342 
Arg Cys Val Ser Pro Asp Gly n e Pro Ser Asp He Thr Gly Sa 

5 10 I5 



60 
120 
180 
240 
294 
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_ ^ Ana rTT cxc CAC A GG GAG GTAGGGGAAA AGAGGAGGCC 395 
TAT CTG GAG GGG TGG ACA GTT CT<_ <-A<- 
Tyr Leu Glu Gly Trp Thr Val Leu His Arg Glu 



20 25 



CGGAAACCCC TCCTGGAGGG AAGAGCCCCA TCGGTCCCAG GCCAGCCTCA GAGGAGAGGG 
GGCAGGCAGC. TGGCTGAGGT CAGCCTGCCA CCC^CTTCC TTCTGTGTCT TGGAGCCACT 
CAGCCAGTAT GAGGCTGCAG CTCCAGCTGA GGTCTGGAAT CTTGTGGTCA GCTCAGCTAG 
GGTGAGGAGG CAGCTGCTGG GCACTGCTTG TTGTCAGCTC AGCAGGTGCT CACC^CCCC 
TGCCGTCCAG TCACGTGTGA CCTTGGGCAT GTCACCTCCC CTATCCTGGC TTCTGTATCT 
TCTACAAAAC AGGCTTCATT CCCCCAGGCC TGCTGGCTGG ACGGCTTTTA GGCCTGTCTG 
AGGACCACGC CAGGAGCGCA AGGCAAAAAG ACACCAGAGA T 

(2) INFORMATION FOR SEQ ID NO: 37: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 2 9 amino acids 

(B) TYPE: amino acid 
(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 

(Xi) SEQUENCE DESCRIPTION: SEQ ID NO: 37: 
lie Cys Cys Arg Cys Val Ser Pro Asp Gly He Pro Ser Asp lie Thr 
1 5 10 

Gly Ala Tyr Leu Glu Gly Trp Thr Val Leu His Arg Glu 
20 25 

(2) INFORMATION FOR SEQ ID NO: 38: 

<i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 6 34 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY : linear 

(ix) FEATURE: 

(A) NAME /KEY: CDS 

(B) LOCATION: 326. .499 

(Xi) SEQUENCE DESCRIPTION: SEQ ID NO: 38: 
CCCCTTGCGA GTT AGG AGG C CGGCTCCCAC CCCAGAAGGT GGCCAGGTTT TCATGCCTTC 
CTAGAGAAAG CTGGGGCTGG TGGCCTCCAC CACAGGGAGA CGCAGACCCT CAGAAACAAG 
TCTGTGAAGT CACAACCAGC CCCAGITTAC AGATGTGAAA CTGAAGCTCC AAAAAGTCAG 



455 

515 

575 

635 

695 

755 

796 



60 
120 
180 
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GACCTCACTO AGTGGGGAGG TOATGGAGTG ««« 

GCCCTGGAGA MICCCCOOt M0CTc:ccTT CATTCTGTTC TO CTGAAGC 

CTOCTCCCT TCTCTCCTGG GGCAO ACA CGT CCC CAT C» W SCA ^ icc 

Thr Arg Pro His Gin Lys Al. Pro Thr 

2 S 2 2 S 2 2 2 » - « 2 2 2 2 2 

20 25 

S S 3 .2 2 22" 2 2 2 E 2 2 SI 



35 40 



S 2 2 £ 2 2 2 5= r "* ACC sc * ™ « c « 

4s Giy Gly Ala Cys Gly Arg Thr Thr Cys Thr Pro 



50 ss 



Ala GGTCAGGAGC CTCAATTTCT TCAGCTGGGA AATGGGCACA CTTGGGCTCA 

TGGCCCCAAG GTCTGTCTTC TCCCTOAGTG GGTAGGTCCC AGAGACAGCT GCCCTTCAGG 
GCCTTCAAGG CTCTTCTGGT TTTGT 

(2) INFORMATION FOR SEQ ID NO : 39 . 

(i) SEQUENCE CHARACTERISTICS • 

(A) LENGTH: 58 amino acids 

(B) TYPE: amino acid 
(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 39 . 

Thr Arg Pro His Gin Lys Ala Pro Thr Ser Thr Arg P ro Thr Ala Trp 
Val Ser Ala Pro Cys Val Pro Ser Ala Gly Thr Gly Pro Arg Ala Asn 

Thr Thr Val P ro Arg Ma Val Thr ^ ^ ^ ^ ^ ^ 

40 4 5 

Ala cys Gly Arg Thr Thr Cys Thr Pro Ala 
30 55 



(2) INFORMATION FOR SEQ ID NO : 40: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 4 58 base pairs 

(B) TYPE: nucleic acid 



240 
300 
352 

400 
448 

496 

54 9 

609 
634 



WO 98/11254 



191 



PCT/US97/16037 



(C) STRAND EDNESS : single 

( D ) TOPOLOGY : 1 inea r 

( ix ) FEATURE : 

(A) NAME/KEY: CDS 

(B) LOCATION: join (171. .173, 177.. 265) 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 40: 

AGAGAGTTCA TAGCACCTTT CCAGCTCCTG GTGGGTTCAA GAG AGAACT C CCGGGATGAA 60 

GAGATGAGAG CACTGAGGTT GGGGGGTCAA CTGGATAGCC AGGGCCCTAG TTCTGTCCTA 120 

AGAGGAGGAA GTTGTGTCTT CTCCATCCAA CCATCCAAAG CCCTCCCCAG ATT 173 

He 
1 

TAG CCG GCA GTG CGT GGT GGA CAA AGA CAA GAG GAA CCA GTG (?CG CTA 221 
Pro Ala Val Arg Gly Gly Gin Arg Gin Glu Glu Pro Val Pro Leu 
5 10 15 

CTG CAG GCT CAA GAA ATG CTT CCG GGC TGG CAT GAA GAA GGA 263 
Leu Gin Ala Gin Glu Met Leu Pro Gly Trp His Glu Glu Gly 
20 25 30 

AGGTGAGCCT CGGCCCTCCC CGCCCCACCA CCACTGCCCC ACCTGCACCC ACAGCTCCCC 323 

GACAGTCATT TACAACTGTA GCCACACTTT ATGACTCAGT GGCAGGCCCC AGGGTGACTG 383 

GCTAATGGCT GAGAAGAGGG AGGGCCTGGA AATCTGACCA TAGGGAGCGG CTGGGCTTGG 443 

TCTTGAGAAA GATTC 458 



(2) INFORMATION FOR SEQ ID NO: 41: 

(i> SEQUENCE CHARACTERISTICS: 

(A) LENGTH : 30 amino acids 

(B) TYPE: amino acid 
< D ) TOPOLOGY : 1 inear 

<ii) MOLECULE TYPE: protein 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 41: 

He Pro Ala Val Arg Gly Gly Gin Arg Gin Glu Glu Pro Val Pro Leu 
15 10 15 

Leu Gin Ala Gin Glu Met Leu Pro Gly Trp His Glu Glu Gly 
20 25 30 



(2) INFORMATION FOR SEQ ID NO: 42: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 662 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 
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(D) TOPOLOGY: linear 

(ix) FEATURE: 

(A) NAME /KEY : CDS 

(B) LOCATION: 84. .188 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 42: 

TCCCACTCCT CATCAGTCAC AGACACCCCC ACCCCCTACT CCATCCCTGT TCTCCCTCCT 6 0 

CACCTCTCTG TGCCTCCTCA CAG CCG TCC AGA ATG AGC GGG ACC GGA TCA 110 

Pro Ser Arg Met Ser Gly Thr Gly Ser 
1 5 

GCA CTC GAA GGT CAA GCT ATG AGG ACA GCA GCC TGC CCT CCA TCA ATG 158 
Ala Leu Glu Gly Gin Ala Met Arg Thr Ala Ala Cys Pro Pro Ser Met 

10 15 20 25 

o 

CGC TCC TGC AGG CGG AGG TCC TGT CCC GAC AGGTACCGGG GTGATCCTGC 208 
Arg Ser Cys Arg Arg Arg Ser Cys Pro Asp. 





30 




35 








CACCCACCCA 


GGGGATCCCC 


CACACTACAG 


AGGAGCTCAC 


CTCCTCCACC 


TCCATTCTCC 


268 


CCAGCCAGGC 


CCTGGAGCAG 


CTGACGGGAG 


GGGCCTCAGA 


TATTACAGAA 


GGGACACTGA 


328 


GTGCGGTTTC 


ACATGGCCCA 


GTTTGCAGCA 


AGGGCAGGAA 


TCGAACCTGG 


CGCCCTGGGG 


388 


CACTTTCTAA 


TTCATCCTAC 


TGCCTGCATC 


CCACAGGCCA 


AGCAGAGTCT 


TCACCTTCAC 


448 


TGAGGGCCTG 


CGATCAGCTC 


AGCTCCGAGA 


GAACAGAGCA 


GTGGCTCAGT 


GGAGAGAGGT 


508 


GGCAAAGTGG 


GGCCCAGCCC 


TTCCCTTGCT 


GAGTGACCTT 


GGGCAAGTCA 


CAGCACCTCT 


568 


CTGAGCCATG 


GTTGCCTCAT 


TGTCAGAAAA 


GGATGATGAT 


TTTTTGCCCT 


GCTTCTCCTC 


628 


TAAGGCTGAC AGACTCCTTG 


GGGCTCTAAA 


GCTG 






662 



(2) INFORMATION FOR SEQ ID NO: 43: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 3 5 amino acids 

(B) TYPE: amino acid 
(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 43: 

Pro Ser Arg Met Ser Gly Thr Gly Ser Ala Leu Glu Gly Gin Ala Met 
1 5 10 15 

Arg Thr Ala Ala Cys Pro Pro Ser Met Arg Ser Cys Arg Arg Arg Ser 
20 25 30 



Cys Pro Asp 
35 
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(2) INFORMATION FOR SEQ ID NO: 44: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 647 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ix) FEATURE: 

(A) NAME /KEY: CDS 

(B) LOCATION: 18 5 . .34 0 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 44: 

TTCTCCCTCA TCCCTGCCTC CTCCCTCCCT CCGTTTTTAC CCTGAGCTTC CTTCAGAGCT 60 

GGAGGGCACC CACTATCCAG CCCCCTCCCC ACATCTGATT CCAGGGAGGG GGCTCTGTGC 120 

AGGGGACAGA GAATGCGGGA GGG CCCGG AC ATCTCCAG C A TTTTCTTCCC TGTATCTCTC 180 

GAAG ATC ACC TCC CCC GTC TCC GGG ATC AAC GGC GAC ATT CGG GCG AAG 22 9 

lie Thr Ser Pro Val Ser Gly lie Asn Gly Asp lie Arg Ala Lys 
15 10 15 

AAG ATT GCC AGC ATC GCA GAT GTG TGT GAG TCC ATG AAG GAG CAG CTG 277 
Lys lie Ala Ser lie Ala Asp Val Cys Glu Ser Met Lys Glu Gin Leu 
20 25 30 

CTG GTT CTC GTT GAG TGG GCC AAG TAC ATC CCA GCT TTC TGC GAG CTC 325 
Leu Val Leu Val Glu Trp Ala Lys Tyr lie Pro Ala Phe Cys Glu Leu 
35 40 45 

CCC CTG GAC GAC CAG GTGAGGATGG GCGTGGATGG TGGGCAGTAG TGGG CAGTGG 380 
Pro Leu Asp Asp Gin 
50 

GCGGGGCAGC CAGGGGGCTG CTGGCCCACC TGGGATATAG CCGTGGACTG GCTTGATTTT 440 

ATTTTATTTA ACAAAATATG TAGTGCACAC ACGTGTCTGA AACTTTAAAT CACCTTACAA 500 

ATATTAACTC AGTTAGCTCC TCCAACAACT CTATGAGGTA GGTACTAAGG TACTATTATT 560 

ACTGCCATCT CATAGGTGAG AGATTGGGGC ACAGAGAGGT TAAGTAACCT GCTCAAGGTC 620 

ACATAGCTAC TATCCAGCAT AGCTGGG 647 

(2) INFORMATION FOR SEQ ID NO: 45: 

(i) SEQUENCE CHARACTERISTICS: 
(A) LENGTH: 52 amino acids 
<B) TYPE : amino acid 
(D) TOPOLOGY: linear 

<ii) MOLECULE TYPE: protein 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 45: 

lie Thr Ser Pro Val Ser Gly He Asn Gly Asp lie. Arg Ala Lys Lys 
1 5 10 15 

He Ala Ser He Ala Asp Val Cys Glu Ser Met Lys Glu Gin Leu Leu 
20 25 30 

Val Leu Val Glu Trp Ala Lys Tyr lie Pro Ala Phe Cys Glu Leu Pro 
35 40 45 

Leu Asp Asp Gin 
50 



(2) INFORMATION FOR SEQ ID NO: 46: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 844 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ix) FEATURE: 

(A) NAME /KEY: CDS 

(B) LOCATION: 429. .515 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 46: 

ATTTTTACAA AGCACCCTTC ATAATTCTCC ATAGCTGGTC CATGGGTGGG AATTTGGGAC 60 

CCACAGTTTT GGAACTTTTT GGGATCATAG ACCTTTTTGA GAATCTCAAA AAAGAAAAAA 120 

AAGCACACAG AATGTTGCTT ACAGTTTCAT CAGGCACACA GAAGAGGCCC AGCACGAAGC 180 

AGTTTCTTGC CCAAGGACAC AGCAGTTCAA GGACAGAGTC AGCGCGAGGT CTCTCAGCTC 240 

TGAGCACATG TTCTTTCCCC TTCCAGGTTT C T AGTTTT AT GGGTAGTAGT TTTATGATGC 300 

CCATTTCACA GTTCAGGCAG GTAGAGGCAG AGGGGAGCAT TAAG CTGACT TGCCCAGCGT 360 

CACTGAGTTG GCTACGGGCA GCCTTCCCAA GGGTACAGAT GGCAAACACT GTTCCTTATC 42 0 

TCTTTCAG GTG GCC CTG CTC AGA GCC CAT GCT GGC GAG CAC CTG CTG CTC 4 70 

Val Ala Leu Leu Arg Ala His Ala Gly Glu His Leu Leu Leu 
1 5 10 

GGA GCC ACC AAG AGA TCC ATG GTG TTC AAG GAC GTG CTG CTC CTA 515 
Gly Ala Thr Lys Arg Ser Met Val Phe Lys Asp Val Leu Leu Leu 
15 20 25 

GGTGAGGCGG CTGCCTGCCC TGGCCAGGGC TCCAGGGAGG GTATGCCTAG CATGGCACTC 575 

ACCCAGGCAA GGAGATTCAC ATGGTGGCAT GCAAGGGTGA GGGAGACTAG TCAGGAGTGG 635 

CCCTGTCCTC AGGCTTGCAT TGGAGGGCTC CAGGACTCAG TTTTCAACTG GGTACCCCAC 695 
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TCAGATGCAA GGAAATGTGG ATGCAAGTCA CCAAATTCCC AGCATTGAAG TCAGAGCACG 755 
ATCAGGGTTA TCCCTGGAAT TACCTGTGCA TCCTTTTTTC TTTTGACAGA GTCTTGCTCT 815 
GTCACTCAGG CTGGAGTGCA ATGATGTGA 844 

(2) INFORMATION FOR SEQ ID NO: 47: 

(i) SEQUENCE CHARACTERISTICS : 

(A) LENGTH: 29 amino acids 

(B) TYPE: amino acid 
(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 47: 

Val Ala Leu Leu Arg Ala His Ala Gly Glu His Leu Leu Leu Gly Ala 
15 10 15 

Thr Lys Arg Ser Met Val Phe Lys Asp Val Leu Leu Leu 
20 25 

(2) INFORMATION FOR SEQ ID NO : 48: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 937 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ix) FEATURE: 

(A) NAME/KEY: CDS 

(B) LOCATION: join (485. .529, 533. .640) 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 48: 

GCAACACTAG TATTTTAATA TAACAATGCT ATGAGGGAGC TCGATTATTT ATCCTCATCT 60 

TATAGATAAG AAAACTGAGG CACAGAGAGG TTAAGTAACT TATCCAACTA TAACCAGCTA 120 

TCAGGGGCAG AGCCATTTAA GCAGGGCAGT GCAGTTCCAG AATCTGGTCC TTTAAC CTTG 180 

ATGCTTTGGT GCCTATCAGG TGACCTTTGA ATGTCATCGA TCTTGTGAGT CATGTTGGTA 240 

AATGGAGCTT GGGTCATGTG AAAGAGGTCC TAGAAAGCCA AGTTCCAAGC TCAGC CGGAT 300 

GACTCAAGGC AGCTTATCTT CTGAATCTGG GCCTCAGCTT CCTTACCTGT GAAATGGGAG 360 

TCACCATCCC TGCAGGTCCT CCTCCCACAG GCACCAGCTA TCTTGCCAAC TTAAAAGCCA 420 

AAACTAGAGG AGAGGGGTCA ACCCAAAGTG ACTTCCCATC CTCCCTCCCT CCCAACCCTT 480 

CCAG GCA ATG ACT ACA TTG TCC CTC GGC ACT GCC CGG AGC TGG CGG AGA 529 
Ala Met Thr Thr Leu Ser Leu Gly Thr Ala Arg Ser Trp Arg Arg 
15 10 15 
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TGA GCC GGG TGT CCA TAC GCA TCC TTG ACG AGC TGG TGC TGC CCT TCC 577 
Ala Gly Cys Pro Tyr Ala Ser Leu Thr Ser Trp Cys Cys Pro Ser 
20 25 30 

AGG AGC TGC AGA TCG ATG ACA ATG AGT ATG CCT ACC TCA AAG CCA TCA 625 
Arg Ser Cys Arg Ser Met Thr Met Ser Met Pro Thr Ser Lys Pro Ser 
35 40 45 

TCT TCT TTG ACC CAG GTACAGTGCA CACCTCCTAA GCCATCCCTG ACTCTCTCTC 680 
Ser Ser Leu Thr Gin 
50 

CAGAACGCTC TGCCAGACTT CTCCTATTGG GTTCTGTACA CTGAGTTCAC AGCCTCATCT 740 

CATGTTAACG ACAGCCAGGA GAGGCCGTTT TCATTTAACA GATGAGGCAA GTCAAGATTT 800 

GAAGAGACAA TATGGCCGGG CGCAGTGGCT CACACCTGTA ATCCCATCAC TTTGGGAGGC 86 0 

TGAGGCGGGC GGATCACCTG AGGTCAGGGG TCAAGATGAG CCTGGCTAAC ATGGAGAAAC 92 0 
CCCATCTCTA CTTAAAA 



(2) INFORMATION FOR SEQ ID NO: 49: 

(i) SEQUENCE CHARACTERISTICS : 

(A) LENGTH: 51 amino acids 

(B) TYPE: amino acid 
(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 49: 

Ala Met Thr Thr Leu Ser Leu Gly Thr Ala Arg Ser Trp Arg Arg Ala 
1 5 10 15 

Gly Cys Pro Tyr Ala Ser Leu Thr Ser Trp Cys Cys Pro Ser Arg Ser 
20 25 30 

Cys Arg Ser Met Thr Met Ser Met Pro Thr Ser Lys Pro Ser Ser Ser 
3 5 40 45 

Leu Thr Gin 
50 



(2) INFORMATION FOR SEQ ID NO: 50: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 978 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 



937 



(ix) FEATURE: 

(A) NAME/KEY: CDS 
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(B) L.OCATION:join(376..387, 391.. 432, 436.. 534, 538.. 610) 
(Xi) SEQUENCE DESCRIPTION; SEQ ID NO: 50: 



GTGGCTCTGC 


CAACAACTGG 


CTGTGCGACC 


CAGGACAAGT 


CCTATCTTTG CACTGTGTCT 


60 


GGGTTTCCCC 


GTGTGTAAGA 


TGAGGCGGTT 


GCTAGGTGCT 


TATTGGATGC ATTCCTCAAG 


120 


TCCCGCCCTC 


CATCTCCTAT 


TCCCCTCTCT 


TCTGGTTTAG 


TGCTTTAGGA AATGTGGCAG 


180 


AAATCTTTTT 


CTGCCTGTGT 


CTAGGAAATC 


ATAATTCATG 


CTGGCGTACC CTGGTTGTTG 


240 


AGGTCCCTGA 


ATCCTTGTGC 


CCACACTGCT 


GAAGACTCCT 


TGTGTGACAC AAGTCAGGGG 


300 


ACATCTGGGT 


CTTGACTCCC 


CAGATGCTC C 


AGGTGGACCC 


TGCTGCCCTC CCTTGCCCAC 


360 


CCTCTTCCAT 


TGTAG ATG 
Met 
1 


CCA AGG GGC 
Pro Arg Gly 


TGA GCG ATC CAG GGA AGA TCA AGC 
Ala He Gin Gly Arg Ser Ser 
5 1( > 


411 



GGC TGC GTT CCC AGG TGC AGG TGA GOT TGG AGG ACT ACA TCA ACG ACC 
Sty lyl Val Pro Arg Cys Arg Ala Trp Arg Thr Thr Ser Thr Thr 

15 20 25 

GCC AGT ATG ACT CGC GTG GCC GCT TTG GAG AGC TGC TGC TGC TGC TGC 
S stl Zt T~hr Arg Val Ala Ala Leu Glu Ser Cys Cys Cys Cys Cys 
30 35 4° 

CCA CCT TGC AGA GCA TCA CGT GGC AGA TGA TCG AGC AGA TCC AGT TCA 
Pro Pro Cys Arg Ala Ser Arg Gly Arg Ser Ser Arg Ser Ser Ser 



45 



50 



55 



TCA AGC TCT TCG GCA TGG CCA AGA TTG ACA ACC TGT TGG AGG AGA TGC 
Ser" Ser Ser Ser Ala Trp Pro Arg Leu Thr Thr Cys Trp Arg Arg Cys 
60 65 ^O 

TGC TGG GAGGTCCGTG CCAAGCCCAG GAGGGGCGGG GTTGGATTGG GGACTCCCCA 
Cys Trp 
75 

GGAGACAGGC CTCACACAGT GAGCTCACCC CTCAGCTCCT TGGCTTCCCC ACTGTGCCGC 
TTTGGGCAAG TTGCTTAACC TGTCTGTGCC TCAGTTTCCT CAC C AG AAAA ATGGGAACAA 
GGCAATGGTC TATTTGTTCA GGCACCGAGA ACCTAGCACG TGCCAGTCAC TGTTCTAAGT 
GCTGGCAATT CAGCAAAGAA CAAGATCTTT GCCCTCGGGG AGGCTGTGTG TGTGTGATAT 
GTATGGATGC GTGGATATCT GTGTATATGC CCGTATGTGC GTGCATGTGT ATATAAAGCC 
TCACATTTTA TGATTTTGA 

(2) INFORMATION FOR SEQ ID NO : 51: 

(i) SEQUENCE CHARACTERISTICS: 
(A) LENGTH: 75 amino acids 



459 

507 

555 

603 

659 

719 
779 
839 
899 
959 
978 
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(B) TYPE: amino acid 
(D) TOPOLOGY; linear 

(ii) MOLECULE TYPE: protein 

(Xi) SEQUENCE DESCRIPTION: SEQ ID NO: 51; 



Met Pro Arg Gly U , Ile Gln Gly ^ Sgr ^ ^ ^ ^ ^ 
5 10 15 

Cys Ar 9 Ala Trp Arg Thr Thr Ser Thr Thr Ala Ser Met Thr Arg Val 

25 30 
Ala Ala Leu Glu Ser Cys Cys Cys Cys Cys Pro Pro Cys Arg Ala Ser 

40 45 
Arg Gly Arg Ser Ser Ar g Ser Ser Ser Ser Ser Ser Ser Ala Trp Pro 

55 60 
Arg Leu Thr Thr Cys Trp Arg Arg Cys Cys Trp 

65 70 



(2) INFORMATION FOR SEQ ID NO: 52: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 984 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ix) FEATURE: 

<A) NAME/KEY: CDS 

(B) LOCATION: join (443 . .490, 494.. 595) 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 52: 
GGGACACATA GATGCTATAA GTAGGTCAGT TGGCTGCAGC AGAGATGTGG GGGATGAGGC 
TGAAAGGTGA GGCGGGACCA AATGGTTGAA GGACTTGCAC TCCAAGGAGC TTTGAGAGCC 
ATTGATTACA TCCATTATGT TACTATGTGA CCAATACATT ACTCATTAGA ACATTTACGT 
GATCTCAGAG CTTCCTTATA TGCACCTTGT TCCTTTCAAC TCACTTTTGT TCTCTTGGTT 
TTTTGGGGTC CTCTTAACAC CCTCATGAAG TCTATAGATG GGAATGGTAC ACCCTAGTTT 
ACTAACCCAG GAATAGGTAC CCAACAGGCA CTGCCAATAT TGGATGGGCT GGTTGATTGG 
CCACGCCTGA GGAAGATGGC GTCCCAAGGC CTGAGGTCTG CATCCCAGAC TCTCCATCCT 
GATCGACCTT CTCTACCTGC AG GGT CCC CC A GCG ATG CAC CCC ATG CCC ACC 

Gly Pro Pro Ala Met His Pro Met Pro Thr 

1 C 



5 10 



J£ si z £ ™ « — r° AtM " c m tcg g " «» *» «» 

ys Thr Leu Thr C ys Arg Asn lie Trp Glu Pro Thr Ser 



60 
120 
180 
240 
300 
360 
420 
472 

520 



20 25 



WO 5)8/11254 



PCT7US97/KS037 



199 



TCG TTG CCA ACA CAA TGC CCA CTC ACC TCA cJCA ACG GAC AGA TGT GTG 568 
Ser Leu Pro Thr Gin Cys Pro Leu Thr Ser Ala Thr Asp Arg Cys Val 
30 35 40 

AGT GGC CCC GAC CCA GGG GAC AGG CAG GTGGGCAAAC TCTGGGATTT 615 
Ser Gly Pro Asp Pro Gly Asp Arg Gin 
45 50 

TACCTTG CAA AGGGTGAGGA TGGGGCTTAA GACAGGAGGC AGGAGAAAGT GGAGTCTAGA 675 

AGGTAGAACC AGGATGCAAC AGTTTTCTGG GTTCCAGGGT AGGGAATAAA GGGCAAGATT 735 

GTCCATTTGT TGAGGCTGTT TATTCAGTAA GGTGACTGAC AGCCTTTACT GAATGAAGCC 795 

ATTGTTGGGA TGAGGCAATC CACTGGATGA GGTAACCCAT TGGGTGAAGA TGTCTTGGGT 855 

GAGAATTCCA TTAGTTGACA TTGTCCATTA AGTAAAAGTG GTCATTGAAG TAAGGCTGCA 915 

CAGTTGGGTA AGGCTATCCA TTAGACATTA GATGAGACTA CCCATTGGGT CAGGATGTCT 975 

GCTGGGCTA 964 



(2) INFORMATION FOR SEQ ID NO: 53: 



(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 50 amino acids 

(B) TYPE: amino acid 
(D) TOPOLOGY: linear 



(ii) MOLECULE TYPE: protein 

<xi) SEQUENCE DESCRIPTION: SEQ ID NO: 53: 



Gly Pro Pro Ala Met His Pro Met Pro Thr Thr Pro Cys Thr Leu Thr 
1 5 10 15 

Cys Arg Asn He Trp Glu Pro Thr Ser Ser Leu Pro Thr Gin Cys Pro 
20 25 30 

Leu Thr Ser Ala Thr Asp Arg Cys Val Ser Gly Pro Asp Pro Gly Asp 
35 40 45 



Arg Gin 
50 



(2) INFORMATION FOR SEQ ID NO: 54: 



(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 1103 base pairs 

(B) TYPE: nucleic acid 
<C) STRANDEDNESS : single 
( D ) TOPOLOGY : 1 inear 



(ix) FEATURE: 

(A) NAME /KEY : CDS 
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(B) LOCATION: join (289. .429, 433.. 477, 481.. 492, 496.. 603, 607 
..630, 634. .750, 754. .810, 814. .843, 847. .1023, 
1027.. 1071, 1075.. 1103) 

<xi) SEQUENCE DESCRIPTION: SEQ ID NO: 54: 

TTTGGGAGAA GCAGTCCAAG TCTGCATATC AAATAAATGA TGGAGGAGAT GGGTGGTAGG 60 

ACCTTCCAGA CCTCATAAAA CTTAGGCTTT ATGATCTGGG ACTCACAGAA GGTTG AG CAA 12 0 

TAAAAGACCT TAGGGATTAT CTGGCTTAAT TAATTCTCTC ATTTTATAGA GGAAGAAATT 180 

AAGTCAAGGT GGGGCAGGGT GGGAGGGGAG AACTTTCCCG GGGCTCTTCA TTTACTCCCA 24 0 

CAAAGGCTGG AATTTTGAGC AGCCCCTGTC TGTCTGTTTG TCCTTCCA GCC ACC CCT 297 

Ala Thr Pro 
1 

GAG ACC CCA CAG CCC TCA CCG CCA GGT GGC TCA GGG TCT GAG CCC TAT 34 5 

Glu Thr Pro Gin Pro Ser Pro Pro Gly Gly Ser Gly Ser Glu Pro Tyr 
5 10 15 

AAG CTC CTG CCG GGA GCC GTC GCC ACA ATC GTC AAG CCC CTC TCT GCC 393 
Lys Leu Leu Pro Gly Ala Val Ala Thr lie Val Lys Pro Leu Ser Ala 
20 25 30 35 

ATC CCC CAG CCG ACC ATC ACC AAG CAG GAA GTT ATC TAG CAA GCC GCT 441 
He Pro Gin Pro Thr He Thr Lys Gin Glu Val He Gin Ala Ala 

40 45 50 

GGG GCT TGG GGG CTC CAC TGG CTC CCC CCA GCC CCC TAA GAG AGC ACC 48 9 

Gly Ala Trp Gly Leu His Trp Leu Pro Pro Ala Pro Glu Ser Thr 

55 60 65 

TGG TGA TCA CGT GGT CAC GGC AAA GGA AGA CGT GAT GCC AGG ACC AGT 537 
Trp Ser Arg Gly His Gly Lys Gly Arg Arg Asp Ala Arg Thr Ser 

70 75 80 

CCC AGA GCA GGA ATG GGA AGG ATG AAG GGC CCG AGA ACA TGG CCT AAG 585 
Pro Arg Ala Gly Met Gly Arg Met Lys Gly Pro Arg Thr Trp Pro Lys 
85 90 95 

GCA CAT CCC ACT GCA CCC TGA CGC CCT GCT CTG ATA ACA AGA CTT 630 
Ala His Pro Thr Ala Pro Arg Pro Ala Leu He Thr Arg Leu 

100 105 110 

TGA CTT GGG GAG ACC CTC TAC TGC CTT GGA CAA CTT TCT CAT GTT GAA 6 78 

Leu Gly Glu Thr Leu Tyr Cys Leu Gly Gin Leu Ser His Val Glu 
115 120 125 

GCC ACT GCC TTC ACC TTC ACC TTC ATC CAT GTC CAA CCC CCG ACT TCA 726 
Ala Thr Ala Phe Thr Phe Thr Phe He His Val Gin Pro Pro Thr Ser 
130 135 140 

TCC CAA AGG ACA GCC GCC TGG AGA TGA CTT GAG CCT TAC TTA AAC CCA 774 
Ser Gin Arg Thr Ala Ala Trp Arg Leu Glu Pro Tyr Leu Asn Pro 

145 150 155 
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GCT CCC TTC TTC CCT AGC CTG GTG CTT CTC CTC TCC TAG CCC CGG TCA 62 2 

Ala Pro Phe Phe Pro Ser Leu Val Leu Leu Leu Ser Pro Arg Ser 

160 165 170 

TGG TGT CCA GAC AGA GCC CTG TGA GGC TGG GTC CAA TTG TGG CAC TTG 870 
Trp Cys Pro Asp Arg Ala Leu Gly Trp Val Gin Leu Trp His Leu 

175 180 185 

GGG CAC CTT GCT CCT CCT TCT GCT GCT GCC CCC ACC TCT GCT GCC TCC 918 
Gly His Leu Ala Pro Pro Ser Ala Ala Ala Pro Thr Ser Ala Ala Ser 
190 195 200 

CTC TGC TGT CAC CTT GCT CAG CCA TCC CGT CTT CTC CAA CAC CAC CTC 966 
Leu Cys Cys His Leu Ala Gin Pro Ser Arg Leu Leu Gin His His Leu 
205 210 215 

TAC AGA GGC CAA GGA GGC CTT GGA AAC GAT TCC CCC AGT CAT TCT GGG 1014 
Tyr Arg Gly Gin Gly Gly Leu Gly Asn Asp Ser Pro Ser His Ser Gly 
220 225 230 

AAC ATG TTG TAA GCA CTG ACT GGG ACC AGG CAC CAG GCA GGG TCT AGA 1062 
Asn Met Leu Ala Leu Thr Gly Thr Arg His Gin Ala Gly Ser Arg 

235 240 245 

AGG CTG TGG TGA GGG AAG ACG CCT TTC TCC TCC AAC CCA AC 110 3 

Arg Leu Trp Gly Lys Thr Pro Phe Ser Ser Asn Pro 

250 255 260 



(2) INFORMATION FOR SEQ ID NO: 55: 

(i) SEQUENCE CHARACTERISTICS: 
(A) LENGTH: 261 amino acids 
<B) TYPE: amino acid 
(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 55: 

Ala Thr Pro Glu Thr Pro Gin Pro Ser Pro Pro Gly Gly Ser Gly Ser 
15 10 15 

Glu Pro Tyr Lys Leu Leu Pro Gly Ala Val Ala Thr lie Val Lys Pro 
20 25 30 

Leu Ser Ala lie Pro Gin Pro Thr lie Thr Lys Gin Glu Val lie Gin 
35 40 45 

Ala Ala Gly Ala Trp Gly Leu His Trp Leu Pro Pro Ala Pro Glu Ser 
50 55 60 

Thr Trp Ser Arg Gly His Gly Lys Gly Arg Arg Asp Ala Arg Thr Ser 
65 70 75 80 

Pro Arg Ala Gly Met Gly Arg Met Lys Gly Pro Arg Thr Trp Pro Lys 
85 90 95 
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Ala His Pro Thr Ala Pro Arg Pro Ala Leu lie Thr Arg Leu Leu Gly 
100 105 110 

Glu Thr Leu Tyr Cys Leu Gly Gin Leu Ser His Val Glu Ala Thr Ala 
115 120 125 

Phe Thr Phe Thr Phe lie His Val Gin Pro Pro Thr Ser Ser Gin Arg 
130 135 140 

Thr Ala Ala Trp Arg Leu Glu Pro Tyr Leu Asn Pro Ala Pro Phe Phe 
145 150 155 160 

Pro Ser Leu Val Leu Leu Leu Ser Pro Arg Ser Trp Cys Pro Asp Arg 
165 170 175 

Ala Leu Gly Trp Val Gin Leu Trp His Leu Gly His Leu Ala Pro Pro 
180 185 190 

Ser Ala Ala Ala Pro Thr Ser Ala Ala Ser Leu Cys Cys His Leu Ala 
195 200 205 

Gin Pro Ser Arg Leu Leu Gin His His Leu Tyr Arg Gly Gin Gly Gly 
210 215 220 

Leu Gly Asn Asp Ser Pro Ser His Ser Gly Asn Met Leu Ala Leu Thr 
225 230 235 240 

Gly Thr Arg His Gin Ala Gly Ser Arg Arg Leu Trp Gly Lys Thr Pro 
245 250 255 

Phe Ser Ser Asn Pro 
260 



(2) INFORMATION FOR SEQ ID NO: 56: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 56: 
GGGCACTGGG AGGAGGCAGT 



(2) INFORMATION FOR SEQ ID NO: 57: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 57: 
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20 



20 



20 



<2) INFORMATION FOR SEQ ID NO: 58: 

<i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 58: 
TCTGGTGTGC ACGACTGCAC 

(2) INFORMATION FOR SEQ ID NO: 59: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 59: 
CTGGAGCTGC AG CCTC AT AC 

(2) INFORMATION FOR SEQ ID NO: 60: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

( D ) TOPOLOGY : linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 60: 

20 

AAGGCTCCCT TAGATGCCTG 

(2) INFORMATION FOR SEQ ID NO: 61: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 23 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 61: 

23 

CCACTCAGGG AGAAGACAGA CCT 



(2) INFORMATION FOR SEQ ID NO: 62: 
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<i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(Xi) SEQUENCE DESCRIPTION : SEQ ID NO: 
CCTAGTTCTG TCCTAAGAGG 



(2) INFORMATION FOR SEQ ID NO: 63: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE : nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID 
GTCATAAAGT GTGG CTACAG 



(2) INFORMATION FOR SEQ ID NO: 64: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 2 2 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(Xi) SEQUENCE DESCRIPTION: SEQ ID NO: 
CCACCCCCTA CTCCATCCCT GT 



(2) INFORMATION FOR SEQ ID NO: 65: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(Xi) SEQUENCE DESCRIPTION: SEQ ID NO: 
CCCTCCCGTC AGCTGCTCCA 



(2) INFORMATION FOR SEQ ID NO: 66: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 
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(D) TOPOLOGY: linear 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 66: 
GTGCAGGGGA CAGAGAATGC 20 

(2) INFORMATION FOR SEQ ID NO : 67: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 22 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

( D ) TOPOLOGY : 1 inear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 67: 
AATCAAGCCA GTCCACGGCT AT 22 

(2) INFORMATION FOR SEQ ID NO: 68: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 23 base pairs 

(B) TYPE: nucleic acid 
(C> STRANDEDNESS: single 
(D) TOPOLOGY: linear 

(Xi) SEQUENCE DESCRIPTION: SEQ ID NO: 68: 

GCCCAGCGTC ACTGAGTTGG CTA 23 

(2) INFORMATION FOR SEQ ID NO : 69: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 69: 
TTGCCTGGGT GAGTGCCATG 20 



(2) INFORMATION FOR SEQ ID NO : 70: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 70: 
GCACCAGCTA TCTTGCCAAC 



20 



WO 98/11254 



PCI7US97/16037 



206 



(2) INFORMATION FOR SEQ ID NO : 71: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

( C ) STRANDEDNES S : s ingl e 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 71: 
AGGAGAAGTC TGGCAGAGCG 



(2) INFORMATION FOR SEQ ID NO: 72: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNES S : single 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 72: 
CTCCTTGTGT GACACAAGTC 



(2) INFORMATION FOR SEQ ID NO : 73: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNES S : single 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 73: 
CTCACTGTGT GAGGCCTGTC 



(2) INFORMATION FOR SEQ ID NO: 74: 

(i) SEQUENCE CHARACTERISTICS: 
(A) LENGTH: 20 base pairs 
<B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION : SEQ ID NO: 74: 
TGGTTGATTG GCCACGCCTG 



(2) INFORMATION FOR SEQ ID NO: 75: 
(i) SEQUENCE CHARACTERISTICS: 
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(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 75: 
ATCCTGGTTC TACCTTCTAG 20 



(2) INFORMATION FOR SEQ ID NO: 76: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

( C ) STRANDEDNES S : s ingl e 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 76: 
CATTTACTCC CACAAAGGCT 20 



(2) INFORMATION FOR SEQ ID NO : 77; 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 2 0 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 77: 
GACCACGTGA TCACCAGGTG 20 



(2) INFORMATION FOR SEQ ID NO: 78: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 1441 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ix) FEATURE: 

(A) NAME/ KEY: CDS 

(B) LOCATION:20. .1414 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 78: 

CTCCAAAACC CTCGTCGAC ATG GAC ATG GCC GAC TAC AGT GCT GCA CTG GAC 52 

Met Asp Met Ala Asp Tyr Ser Ala Ala Leu Asp 
15 10 



CCA GCC TAC ACC ACC CTG GAA TTT GAG AAT GTG CAG GTG TTG ACG ATG 
Pro Ala Tyr Thr Thr Leu Glu Phe Glu Asn Val Gin Val Leu Thr Met 
15 20 25 
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GGC AAT GAC ACG TCC CCA TCA GAA GGC ACC AAC CiC AAC GCG CCC AAC 14 8 

Gly Asn Asp Thr Ser Pro Ser Glu Gly Thr Asn Leu Asn Ala Pro Asn 
30 35 40 

AGC CTG GGT GTC AGC GCC CTG TGT GCC ATC TGC GGG GAC CGG GCC ACG 196 
Ser Leu Gly Val Ser Ala Leu Cys Ala lie Cys Gly Asp Arg Ala Thr 
45 50 55 

GGC AAA CAC TAC GGT GCC TCG AGC TGT GAC GGC TGC AAG GGC TTC TTC 244 
Gly Lys His Tyr Gly Ala Ser Ser Cys Asp Gly Cys Lys Gly Phe Phe 
60 65 70 75 

CGG AGG AGC GTG CGG AAG AAC CAC ATG TAC TCC TGC AGA TTT AGC CGG 292 
Arg Arg Ser Val Arg Lys Asn His Met Tyr Ser Cys Arg Phe Ser Arg 
SO 85 90 

CAG TGC GTG GTG GAC AAA GAC AAG AGG AAC CAG TGC CGC TAC TGC AGG 34 0 

Gin Cys Val Val Asp Lys Asp Lys Arg Asn Gin Cys Arg Tyr Cys Arg 
95 100 105 

CTC AAG AAA TGC TTC CGG GCT GGC ATG AAG AAG GAA GCC GTC CAG AAT 388 
Leu Lys Lys Cys Phe Arg Ala Gly Met Lys Lys Glu Ala Val Gin Asn 
110 115 120 

GAG CGG GAC CGG ATC AGC ACT CGA AGG TCA AGC TAT GAG GAC AGC AGC 436 
Glu Arg Asp Arg lie Ser Thr Arg Arg Ser Ser Tyr Glu Asp Ser Ser 
125 130 135 

CTG CCC TCC ATC AAT GCG CTC CTG CAG GCG GAG GTC CTG TCC CGA CAG 484 
Leu Pro Ser He Asn Ala Leu Leu Gin Ala Glu Val Leu Ser Arg Gin 
140 145 150 155 

ATC ACC TCC CCC GTC TCC GGG ATC AAC GGC GAC ATT CGG GCG AAG AAG 532 
lie Thr Ser Pro Val Ser Gly He Asn Gly Asp He Arg Ala Lys Lys 
160 165 170 

ATT GCC AGC ATC GCA GAT GTG TGT GAG TCC ATG AAG GAG CAG CTG CTG 580 
lie Ala Ser lie Ala Asp Val Cys Glu Ser Met Lys Glu Gin Leu Leu 
175 180 185 

GTT CTC GTT GAG TGG GCC AAG TAC ATC CCA GCT TTC TGC GAG CTC CCC 628 
Val Leu Val Glu Trp Ala Lys Tyr He Pro Ala Phe Cys Glu Leu Pro 
190 195 200 

CTG GAC GAC CAG GTG GCC CTG CTC AGA GCC CAT GCT GGC GAG CAC CTG 676 
Leu Asp Asp Gin Val Ala Leu Leu Arg Ala His Ala Gly Glu His Leu 
205 210 215 

CTG CTC GGA GCC ACC AAG AGA TCC ATG GTG TTC AAG GAC GTG CTG CTC 724 
Leu Leu Gly Ala Thr Lys Arg Ser Met Val Phe Lys Asp Val Leu Leu 
220 225 230 235 

CTA GGC AAT GAC TAC ATT GTC CCT CGG CAC TGC CCG GAG CTG GCG GAG 772 
Leu Gly Asn Asp Tyr He Val Pro Arg His Cys Pro Glu Leu Ala Glu 
240 245 250 
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ATG AGC CGG GTG TCC ATA CGC ATC CTT GAC GAG CTG GTG CTG CCC TTC 
Met Ser Arg Val Ser He Arg He Leu Asp Glu Leu Val Leu Pro Phe 
255 260 265 

CAG GAG CTG CAG ATC GAT GAC AAT GAG TAT GCC TAC CTC AAA GCC ATC 
Gin Glu Leu Gin He Asp Asp Asn Glu Tyr Ala Tyr Leu Lys Ala He 
270 275 280 

ATC TTC TTT GAC CCA GAT GCC AAG GGG CTG AGC GAT CCA GGG AAG ATC 
He Phe Phe Asp Pro Asp Ala Lys Gly Leu Ser Asp Pro Gly Lys He 
285 290 295 

AAG CGG CTG CGT TCC CAG GTG CAG GTG AGC TTG GAG GAC TAC ATC AAC 
Lvs Arg Leu Arg Ser Gin Val Gin Val Ser Leu Glu Asp Tyr He Asn 
300 " 305 310 315 

GAC CGC CAG TAT GAC TCG CGT GGC CGC TTT GGA GAG CTG CTG CTG CTG 
Asp Arg Gin Tyr Asp Ser Arg Gly Arg Phe Gly Glu Leu Leu Leu Leu 
320 325 .330 

CTG CCC ACC TTG CAG AGC ATC ACC TGG CAG ATG ATC GAG CAG ATC CAG 
Leu Pro Thr Leu Gin Ser He Thr Trp Gin Met He Glu Gin He Gin 
335 340 345 

TTC ATC AAG CTC TTC GGC ATG GCC AAG ATT GAC AAC CTG TTG CAG GAG 
Phe He Lys Leu Phe Gly Met Ala Lys He Asp Asn Leu Leu Gin Glu 
350 355 360 

ATG CTG CTG GGA GGG TCC CCC AGC GAT GCA CCC CAT GCC CAC CAC CCC 
Met Leu Leu Gly Gly Ser Pro Ser Asp Ala Pro His Ala His His Pro 
365 370 375 

CTG CAC CCT CAC CTG ATG CAG GAA CAT ATG GGA ACC AAC GTC ATC GTT 
Leu His Pro His Leu Met Gin Glu His Met Gly Thr Asn Val He Val 
380 385 390 395 

GCC AAC ACA ATG CCC ACT CAC CTC AGC AAC GGA CAG ATG TGT GAG TGG 
Ala Asn Thr Met Pro Thr His Leu Ser Asn Gly Gin Met Cys Glu Trp 
400 405 410 

CCC CGA CCC AGG GGA CAG GCA GCC ACC CCT GAG ACC CCA CAG CCC TCA 
Pro Arg Pro Arg Gly Gin Ala Ala Thr Pro Glu Thr Pro Gin Pro Ser 
41S 420 425 

CCG CCA GGT GCG TCA GGG TCT GAG CCC TAT AAG CTC CTG CCG GGA GCC 
Pro Pro Gly Ala Ser Gly Ser Glu Pro Tyr Lys Leu Leu Pro Gly Ala 
430 435 440 

GTC GCC ACA ATC GTC AAG CCC CTC TCT GCC ATC CCC CAG CCG ACC ATC 
Val Ala Thr He Val Lys Pro Leu Ser Ala He Pro Gin Pro Thr He 
445 450 455 

ACC AAG CAG GAA GTT ATC T AGCAAG CCG CTGGGGCTTG GGGGCTC 
Thr Lys Gin Glu Val He 
460 465 



820 



868 



916 



964 



1012 



1060 



1108 



1156 



1204 



1252 



1300 



1348 



1396 



1441 
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<2) INFORMATION FOR SEQ ID NO : 79: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 465 amino acids 

(B) TYPE: amino acid 
(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 79: 
Met Asp Met Ala Asp Tyr Ser Ala Ala Leu Asp Pro Ala Tyr Thr Thr 

Leu Glu Phe Glu Asn Val Gin Val Leu Thr Met Gly Asn Asp Thr Ser 
20 25 30 



Pro Ser Glu Gly Thr Asn Leu Asn Ala 
35 40 



Pro Asn Ser Leu Gly Val Ser 
45 



Ala Leu Cys Ala He Cys Gly Asp Arg Ala Thr Gly Lys His Tyr Glv 
50 55 60 

Ala Ser Ser Cys Asp Gly Cys Lys Gly Phe Phe Arg Arg Ser Val Aro 
65 70 75 8 o 

Lys Asn His Met Tyr Ser Cys Arg Phe Ser Arg Gin Cys Val Val Asp 
85 9 o 95 

Lys Asp Lys Arg Asn Gin Cys Arg Tyr Cys Arg Leu Lys Lys Cys Phe 
100 105 110 

Arg Ala Gly Met Lys Lys Glu Ala Val Gin Asn Glu Arg Asp Arg He 
115 120 125 



Asn 



Ser Thr Arg Arg Ser Ser Tyr Glu Asp Ser Ser Leu Pro Ser He 
130 135 140 

Ala Leu Leu Gin Ala Glu Val Leu Ser Arg Gin He Thr Ser Pro Val 
145 150 155 160 

Ser Gly He Asn Gly Asp He Arg Ala Lys Lys He Ala Ser He Ala 
165 170 175 

Asp Val Cys Glu Ser Met Lys Glu Gin Leu Leu Val Leu Val Glu Trr> 
180 185 190 

Ala Lys Tyr He Pro Ala Phe Cys Glu Leu Pro Leu Asp Asp Gin Val 
195 200 205 

Ala Leu Leu Arg Ala His Ala Gly Glu His Leu Leu Leu Gly Ala Thr 
210 215 220 

Lys Arg Ser Met Val Phe Lys Asp Val Leu Leu Leu Gly Asn Asp Tyr 
225 230 235 240 

He Val Pro Arg His Cys Pro Glu Leu Ala Glu Met Ser Arg Val Ser 
245 250 255 
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lie Arg lie Leu Asp Glu Leu \>al Leu Pro Phe Gin Glu Leu Gin lie 
260 265 270 

Asp Asp Asn Glu Tyr Ala Tyr Leu Lys Ala lie lie Phe Phe Asp Pro 
275 280 285 

Asp Ala Lys Gly Leu Ser Asp Pro Gly Lys lie Lys Arg Leu Arg Ser 
290 295 300 

Gin Val Gin Val Ser Leu Glu Asp Tyr lie Asn Asp Arg Gin Tyr Asp 
305 310 315 320 

Ser Arg Gly Arg Phe Gly Glu Leu Leu Leu Leu Leu Pro Thr Leu Gin 
325 330 335 

Ser He Thr Trp Gin Met He Glu Gin He Gin Phe He Lys Leu Phe 
340 345 350 

Gly Met Ala Lys He Asp Asn Leu Leu Gin Glu Met Leu Leu Gly Gly 
355 360 365 

Ser Pro Ser Asp Ala Pro His Ala His His Pro Leu His Pro His Leu 
370 375 380 

Met Gin Glu His Met Gly Thr Asn Val He Val Ala Asn Thr Met Pro 
385 390 395 400 

Thr His Leu Ser Asn Gly Gin Met Cys Glu Trp Pro Arg Pro Arg Gly 
405 410 415 

Gin Ala Ala Thr Pro Glu Thr Pro Gin Pro Ser Pro Pro Gly Ala Ser 
420 425 430 

Gly Ser Glu Pro Tyr Lys Leu Leu Pro Gly Ala Val Ala Thr He Val 
435 440 445 

Lys Pro Leu Ser Ala He Pro Gin Pro Thr He Thr Lys Gin Glu Val 
450 455 460 

He 
465 



(2) INFORMATION FOR SEQ ID NO : 80: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 2 329 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(Xi) SEQUENCE DESCRIPTION: SEQ ID NO: 80: 
GGGGCCCTGA TTCACGGGCC GCTGGGGCAG GGTTGGGGGT TGGGGGTGCC CACAGGGTTG 60 
GCTAGTGGGG TTTTGGGGGG GCAGTGGGTG CAAGGAGTTT GGTTTGTGTC TGCCGGCCGG 12 0 
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CTCTGGGACA AGTGTTTTTC CGTGATTGAG GGTGTCTGCA GGCCAGTGTG TTCCCATGTG 1860 

AATGCACGTA TCTGTGTGTG TGCACGACTG CTTGTGTGAG CAGATCCCTA GTCGTGTCTG 1920 

GGTGTGTATC GGTTGTGCAT GCATTTGTGT GCATCCTGTG TTTCTCTGAA ACTCTTAGGG 198 0 

CCATATGAAT TTCTAAAATC TATTCAGATT TTAGAAAGGT AATCTGGGGC CAGG CGTGGT 204 0 

GGCTCATGCC TGTAATCCCA GCACTTTGGA AGGCCGAGGT GGGCAGATCA CTTGAGGTCA 2100 

GGAGTTCAAG ACCAGCCTGG CCAACACGGT GAAACCCCGT CTCTACTAAA AGTACAAAAA 2160 

TTAGCCAGGC GTGGAGCACG TGCCTGTAGT CCCAGCTACT TGGGAGG CTG AGGCAGAATC 2220 

GCTTGAACCT GGGAGGCGGA GGTTGCAGTG AG CTGAGATT TGGCCACTGC ACTGCACTCC 22 80 

AGCCTGGGCA ACAGAGTGAG TACTCTGCCA AAAAAAAAAA AAAAAAAAA 232 9 

(2) INFORMATION FOR SEQ ID NO: 81: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 2 0 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 81: 
CACCTGGTGA TCACGTGGTC 20 

(2) INFORMATION FOR SEQ ID NO: 82: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 2 0 base pairs 

( B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 82: 
GTAAGGCTCA AGTCATCTCC 20 

(2) INFORMATION FOR SEQ ID NO: 83: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 5 amino acids 

(B) TYPE: amino acid 

( C ) STRANDEDNESS : 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 83: 

Glu Gly Cys Lys Gly 
1 5 
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(2) INFORMATION FOR SEQ ID NO : 84: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 5 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS : 

(D) TOPOLOGY: linear 

(Xi) SEQUENCE DESCRIPTION; SEQ ID NO: 84: 

Glu Gly Cys Lys Ala 
1 5 



(2) INFORMATION FOR SEQ ID NO : 85: 

(i) SEQUENCE CHARACTERISTICS: 
(A) LENGTH: 5 amino acids 
<B) TYPE: amino acid 
(C) STRANDEDNESS : 
<D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 85: 

Asp Gly Cys Lys Gly 
1 5 



(2) INFORMATION FOR SEQ ID NO: 86: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 36 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ix) FEATURE: 

(A) NAME /KEY : CDS 

(B) LOCATION : 1 . .36 

(Xi) SEQUENCE DESCRIPTION: SEQ ID NO: 86: 

GAC ACG TAC AGC GGC CCC CCC CCA GGG CCA GGC CCG 
Asp Thr Tyr Ser Gly Pro Pro Pro Gly Pro Gly Pro 
1 5 io 



(2) INFORMATION FOR SEQ ID NO: 87: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 12 amino acids 

(B) TYPE: amino acid 
(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 87: 

Asp Thr Tyr Ser Gly Pro Pro Pro Gly Pro Gly Pro 
1 5 10 



(2) INFORMATION FOR SEQ ID NO: 88: 

(i) SEQUENCE CHARACTERISTICS: 
(A) LENGTH: 36 base pairs 
<B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ix) FEATURE: 

(A) NAME /KEY : CDS 

(B) LOCATION :1 . .36 

(Xi) SEQUENCE DESCRIPTION: SEQ ID NO: 88: 

GAC ACG TAC AGC GGC CCC CCC CCC AGG GCC AGG CCC 
Asp Thr Tyr Ser Gly Pro Pro Pro Arg Ala Arg Pro. 
1 5 10 



<2) INFORMATION FOR SEQ ID NO: 89: 

( i ) S EQUENCE CHARACTERI STI CS : 
(A) LENGTH: 12 amino acids 
{B> TYPE: amino acid 
(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 89: 

Asp Thr Tyr Ser Gly Pro Pro Pro Arg Ala Arg Pro 
15 10 



(2) INFORMATION FOR SEQ ID NO: 90: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 21 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 90: 
CATGAACCCC GAAGAGTGGT G 



(2) INFORMATION FOR SEQ ID NO: 91: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 
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(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 91: 
GCCTCCAGAC ACCTGTTACT 20 

(2) INFORMATION FOR SEQ ID NO: 92: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 22 base pairs 

(B) TYPE: nucleic acid 
<C) STRANDEDNESS: single 
<D) TOPOLOGY: linear 

<xi) SEQUENCE DESCRIPTION: SEQ ID NO: 92: 

GGCGATCATG GCAAGTTAGA AG 22 



(2) INFORMATION FOR SEQ ID NO: 93: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 21 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 93: 
TTGGTGAGAG TATGGAAGAC C 



(2) INFORMATION FOR SEQ ID NO: 94: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 21 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 94: 
GGGGTTTGCT TGTGAAACTC C 



(2) INFORMATION FOR SEQ ID NO: 95: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 95: 
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TTGGTGGGAA ACGGGCTTGG 20 

(2) INFORMATION FOR SEQ ID NO: 96: 

(i) SEQUENCE CHARACTERISTICS: 
{A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 96: 
CTCCCACTAG TACCCTAACC 2 0 

(2) INFORMATION FOR SEQ ID NO: 97: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 22 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

{xi) SEQUENCE DESCRIPTION: SEQ ID NO: 97: 
GAGAGGG CAA AGGTCACTTC AG 22 

(2) INFORMATION FOR SEQ ID NO: 98: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 2 2 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 98: 
AGTGAAGGCT ACAGACCCTA TC 22 

(2) INFORMATION FOR SEQ ID NO: 99: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 21 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 
{D> TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 99: 

TTCCTGGGTC TGTGTACTTG C 21 



(2) INFORMATION FOR SEQ ID NO: 100: 
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(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 21 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D ) TOPOLOGY : 1 inear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 100: 
TGTGTTTTGG GCCAAGCACC A 21 

(2) INFORMATION FOR SEQ ID NO: 101: 

( i ) SEQUENCE CHARACTERISTICS : 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(Xi) SEQUENCE DESCRIPTION: SEQ ID NO: 101: 
AACCAGATAA GATCCGTGGC 20 

(2) INFORMATION FOR SEQ ID NO: 102: 

<i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 22 base pairs 

(B) TYPE: nucleic acid 
<C) STRANDEDNESS: single 
(D) TOPOLOGY: linear 

(Xi) SEQUENCE DESCRIPTION: SEQ ID NO: 102: 

AACCAGACTC ACAGCCTGAA CC 22 

(2) INFORMATION FOR SEQ ID NO: 103: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 103: 
TCACAGGGCA ATGGCTGAAC 20 

(2) INFORMATION FOR SEQ ID NO: 104: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 2 0 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 
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(Xi) SEQUENCE DESCRIPTION: SiJQ ID NO: 
TGCCGAGTCA TTGTTCCAGG 



(2) INFORMATION FOR SEQ ID NO: 105: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 22 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY : linear 

(Xi) SEQUENCE DESCRIPTION: SEQ ID NO: 105: 
CCTCTTATCT TATCAGCTCC AG 

(2) INFORMATION FOR SEQ ID NO: 106: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 22 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 106: 
CTGCTCTTTG TGGTCCAAGT CC 

(2) INFORMATION FOR SEQ ID NO: 107: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 21 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(Xi) SEQUENCE DESCRIPTION: SEQ ID NO: 107: 
GAGTTTGAAG GAGACCTACA G 



(2) INFORMATION FOR SEQ ID NO: 108: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 21 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO; 108 
ATCCACCTCT CCTTATCCCA G 
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(2) INFORMATION FOR SEQ ID NO: 109: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 21 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS ; single 
<D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 109: 
ACTTCCGAGA AAGTTCAGAC C 

21 

(2) INFORMATION FOR SEQ ID NO: 110: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 21 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 110: 
TTTGCCTGTG TATGCACCTT G 

21 

(2) INFORMATION FOR SEQ ID NO: HI: 

(i) SEQUENCE CHARACTERISTICS : 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: HI : 
GCCGAGTCCA TGCTTGCCAC 

20 

(2) INFORMATION FOR SEQ ID NO: 112: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 
<D) TOPOLOGY: linear 

(Xi) SEQUENCE DESCRIPTION: SEQ ID NO: 112: 
CTTTGCTGGT TGAGTTGGGC 

20 

(2) INFORMATION FOR SEQ ID NO: 113: 

(i) SEQUENCE CHARACTERISTICS: 
(A) LENGTH: 21 base pairs 
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(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 113: 
TTCCATGACA GCTGCCCAGA G 21 



(2) INFORMATION FOR SEQ ID NO: 114: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 114: 
TAAAGGTTGG AGCCCCTCTG 20 



<2> INFORMATION FOR SEQ ID NO: 115: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

( D ) TOPOLOGY : 1 inear 

(xi> SEQUENCE DESCRIPTION: SEQ ID NO: 115 : 
TTGTAAGGTG ACCCCATCAG 2 0 



(2) INFORMATION FOR SEQ ID NO: 116: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 2 0 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 116: 
TTGGTGATGT CCAGAAGTCC 2 0 



(2) INFORMATION FOR SEQ ID NO: 117: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 117: 
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CAGAATGTGT CAGAGTTCGC 



(2) INFORMATION FOR SEQ ID NO: 118: 

(i) SEQUENCE CHARACTERISTICS: 
(A) LENGTH: 20 base pairs 
<B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

<xi) SEQUENCE DESCRIPTION: SEQ ID NO: 116 
CTCCCTCCTG TTCTTAAGTG 

(2) INFORMATION FOR SEQ ID NO: 119: 

(i) SEQUENCE CHARACTERISTICS : 

(A) LENGTH: 20 base pairs 

(B) TYPE; nucleic acid 
<C) STRANDEDNESS: single 
(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 119 

CTGGACTCCC AGTTCAGTCA 



(2) INFORMATION FOR SEQ ID NO: 12 0: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(Xi) SEQUENCE DESCRIPTION: SEQ ID NO: 120: 
CAAGGATCCA GAAGATTGGC 



(2) INFORMATION FOR SEQ ID NO: 121: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 2 0 base pairs 

(B) TYPE: nucleic acid 
{C> STRANDEDNESS: single 
(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 121: 

CGTCCTCTGG GAAGATCTGC 



(2) INFORMATION FOR SEQ ID NO: 122: 
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(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 24 base pairs 

(B) TYPE: nucleic acid 
<C) STRANDEDNESS : single 
<D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 122: 

GCAACAGAGC AAGACTCCAT CTCA 24 



(2) INFORMATION FOR SEQ ID NO: 123: 

<i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 22 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 123: 
GAGTTTAATG GAAGAACTAA CC 22 



(2) INFORMATION FOR SEQ ID NO: 124: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 2 3 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY : linear 

(Xi) SEQUENCE DESCRIPTION: SEQ ID NO: 124: 
CCTCATGGAG AAACATCCTA AGT 2 3 



(2) INFORMATION FOR SEQ ID NO: 125: 

<i> SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 24 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 
<D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 125: 

AGGGAGTGCA CGGCTGAGCT CCTG 24 



(2) INFORMATION FOR SEQ ID NO : 126: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 6254 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 
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(D) TOPOLOGY: linear 



(ix) FEATURE: 

(A) NAME/KEY: modif iedjb>ase 

(B) LOCATION: 128 7. .4273 

(D) OTHER INFORMATION : /notes "N - A or G or C or T M 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 126: 

AGCCAGCACT GTTCTTGGCA CATGGTAATC TTAACATATT TTTTCCTACA GGGAGGCCTG 60 

GTGTCAGGCC GGGAGTGGGG TGGAAGGGTC CCAAAATGGA TGGAAGGGCC CCAAAATGGC 120 

CGTGAGCATC CTCTGCCCTT GAGAAGAGCT AGCCCAGCTG TCTAGAGCTC CCTGCTGCTG 180 

CCGCTCTCGT AAGCAGCAAG CATTTTTGGC TCTCCTGTCT CAGCATGATG CCCCTACAAG 240 

GTTCTTTCGG GGGTGGGACC CAACGCTGCT CTCCTGATGG CCTCCCTGGC TCCCAGCACC 300 

TTCCATCCCA GCTGCTCAGG GCCCCTCACC TGCGCCTCCC CCACCCTCCC CTCTGCCCAC 360 

TCCCATCGCA GGCCATAGCT CCCTGTCCCT CTCCGCTGCC ATGAGGCCTG CACTTTGCAG 420 

GGCTGAAGTC CAAAGTTCAG TCCCTTCGCT AAGCACACGG ATAAATATGA AC CTTGGAG A 480 

ATTTCCCCAG CTCCAATGTA AACAGAACAG GCAGGGGCCC TGATTCACGG GCCGCTGGGG . 54 0 

CCAGGGTTGG GGGTTGGGGG TGCCCACAGG GCTTGG CTAG TGGGG TTTTG GGGGGGCAGT 600 

GGGTGCAAGG AGTTTGGTTT GTGTCTGCCG GCCGGCAGGC AAACGCAACC CACGCGGTGG 6 60 

GGGAGGCGGC TAGCGTGGTG GACCCGGGCC GCGTGGCCCT GTGGCAGCCG AGCCATGGTT 720 

TCTAAACTGA GCCAGCTGCA GACGG AG CTC CTGGCGGCCC TGCTCGAGTC AGGGCTGAGC 780 

AAAGAGGCAC TGATCCAGGC ACTGGGTGAG CCGGGGCCCT ACCTCCTGGC TGGAGAAGGC 84 0 

CCCCTGGACA AGGGGGAGTC CTGCGGCGGC GGTCGAGGGG AGCTGGCTGA GCTGCCCAAT 900 

GGGCTGGGGG AGACTCGGGG CTCCGAGGAC GAGACGGACG ACGATGGGGA AGACTTCACG 960 

CCACCCATCC TCAAAGAGCT GGAGAACCTC AGCCCTGAGG AGGCGGCCCA CCAGAAAGCC 102 0 

GTGGTGGAGA CCCTTCTGCA GTAAGGAGCC CTGCCCCGTC CCCGCTCCCA GGAGAGCCTA 1080 

GAGGGGCCCC CCTCAGCTCC TAACGAGCCC CCCTTCTGAG TTGAGTCCCC ATGACCTTCA 114 0 

GCCTTTAGCC TAGTTGCTGG GAAGGGGGAC AGGGCCCATG AGAGCCCAGG GGTCCTTGCT 1200 

TGGAGGTTTG AGCCTCCAGC CCCTGAACTG CTCCTCTGCA GAGTC CCAAA TCCCATGAGC 126 0 

CCAGGCCTTT AGCCCAGTCC TTGGGCNAGG GGGACATTTC CCAGGGGGTC CAAGATGGGA 1320 

GAAAAAGCAG TGAATTCACA ACTCAAATGC CCACCCACCC ATCCATCCAT CCGTCCATCC 138 0 

ACCCATTCAT CCATTCATCC ATTCACCCAT CCATCCATCC ACATATCTTC ATCTGTGTTG 1440 
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TGTGTCTGTG TATCCATGTT TCTAAACCTT TATCTGTTCC AGTGTCTGTA TCCATAGGCC 1500 

TGTGTCCACG TTTGTCATGT GTGTGCGTCN ACAAGTCTCT GTCCTCATGA CCATGTGTCT 156 0 

GTGTCCCTGT GTCCTGGCAT AAATGACCAT ACCTCACCGT CCCTGAGTCT ATGTGTAGGC 162 0 

CCCTGGGCTC CATAACTGCT TTCATGCACA GTCCCCACCC TCAGAGTTGA CAAGGTTCCA 1680 

GCACCCAGGA CCGCAGCCCC ACCTATGGGG AGAGACAGCC CTTGCTGAGC AGATCCCGTC 1740 

CTTGCCCTCT CCCAGGGAGG ACCCGTGGCG TGTGGCGAAG ATGGTCAAGT CCTACCTGCA 1800 

GCAGCACAAC ATCCCACAGC GGGAGGTGGT CGATACCACT GGCCTCAACC AGTCCCACCT 1860 

GTCCCAACAC CTCAACAAGG GCACTCCCAT GAAGACGCAG AAGCGGGCCG CCCTGTACAC 1920 

CTGGTACGTC CGCAAGCAGC GAGAGGTGGC GCAGCGTAAG TAATGACCCT ACCCCGCATC 1980 

TTCCCTGGGA GGGCCCAGGA CTCTCCCCTA ACTCATAGGT GGGGGCTGGA AGCTTCACCA 2040 

TCCCCATTAC ACAGACAGGT AGATGGAAAG GAAGTCAGTG GGATTCAACC TGCATTTATT 2100 

ACCTATTCTG CGCCAGGCAC TCTGTGGGAC GGGAGTANAC TTGGTCCTGA ACATCCAAAG 2160 

ATGAATGAAA TGGGTCCCTG CTTTCTTTTT CTTTTTTTAG ATACGTGACT CTGGAAAAAT 222 0 

ATGTAAGCTC TCTGAGCCTC AGCTTCTTCA TCTGTACAAT GGGGATAGTA AATGTGCCAA 22 8 0 

ATCAGAACAA ATGCTAATGC TTACCTGCAG TCTTGTACTG AGAAGGATGG TGAGATCATA 234 0 

TCTTGGGTTG GTAGGAAAGC ATTCAGGGAT TGATTAGTGA TGTTTGCCTT GAACACAGGT 2400 

TAAGAAAGTG ATGGCATGTG TGCTGTGTGT TTGTCATCAG TAGATTAGAT GATTTCTAAG 2460 

TTCTAGCTGT AAGCTCCTCT GGTTCAGCGC CATGGCAATG AGAAAGAATC AAGGGCAAGG 2 520 

TCAGGGGAAT GGACGAGGGA AGGTGAGAGT GGCCAGTACC CCACTCACGG CTTTCTGTGC 2 580 

CTGCAGAGTT CACCCATGCA GGGCAGGGAG GGCTGATTGA AGAGCCCACA GGTGATGAGC 2640 

TACCAACCAA GAAGGGGCGG AGGAACCGTT TCAAGTGGGG CCCAGCATCC CAGCAGATCC 2700 

TGTTCCAGGC CTATGAGAGG CAGAAGAACC CTAGCAAGGA GGAGCGAGAG GTACAACGGC 2760 

GGGCGGGAAA CAGTGCTGGT TTGGTCTGGG CTGCGGCAAG GCCAGGGGAA GGGGAAGGTG 2820 

ACTCTAGGTC CTGTAAAAGG CTGTC CAGTT GCCGAGAACT CCTGATATTG GCTTAG CCTG 2880 

GCCCAGAAAA TTGAGAATAC TTGAACCTAA GCCCATTCCT CGCAGCCCCC CTGCACCNTG 2940 

GACACCAAGC AACCCCTTCC ATGGATGCTC ACCCAATTCG ATTCTCTCTA CAATCCTATG 3000 

GCTCTTTTGC TCACTTTATG AATGGAGAGA CTGAGGTCAG ACAGACTGTC AATTGCCCAA 306 0 

GGTCACACAG CAGACCTGGC ATTGGAACCC AGATCTGCCA GCCTCAAACC CTCCGGCAGA 3120 

GNTCAGCTTC TCAGAACCCT CCCCTTCATG CCCAGGACAG GGTTCCTCTG AGCCTGGCCT 3180 
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GGAGGCTCAT 


' GGGTGGCTAT 


' TTCTG CAGGG 


CGGAATGCAT 


' CCAGAGAGGG 


; GTGTCCCCA1 


3240 


CACAGGCACA 


GGGGCTGGGC 


TCCAACCTCG 


TCACGGAGGT 


* GCGTGTCTAC 


AACTGGTTTG 


3300 


CCAACCGGCG 


CAAAGAAGAA 


GCCTTCCGGC 


ACAAG CTGGC 


CATGGACACG 


TACAGCGGGC 


3360 


CCCCCCCAGG 


GCCAGGCCCG 


GGACCTGCGC 


TGCCCGCTCA 


CAGCTCCCCT 


GGCCTGCCTC 


3420 


CACCTGCCCT 


CTCCCCCAGt 


AAGGTCCACG 


GTAAGTGGTA 


TGTGGGGACA 


AGGGACACGT 


3480 


GGGAAGGTGG 


GAGGGTTGGG 


GAGGACTGTC 


CCATTGACAG 


CAGTCACCTA 


AACCTCTTTG 


3540 


CACGTCAGTT 


TGGTTCCATT 


CGCAGCTGAC 


CCAGGGATTG 


GCAAAAGGTA 


GAAACAAAGG 


3600 


CAGATTTGCT 


GGCTGCATAA 


AGGCAGACAG 


GCAGATGGCC 


TAAGCAAACC 


AATGGAGTTT 


3660 


GAAGTGCTGA 


GGGCTGTGGA 


GGCAGGGGAG 


GGCAGGGAAG 


TGGGGTGCTG 


AGGCAGGACA 


3720 


CTGCTTCCCT 


CTCCAGGTGT 


GCGCTATGGA 


CAGCCTGCGA 


CCAGTGAGAC 


TGCAGAAGTA 


3780 


CCCTCAAGCA 


GCGGCGGTCC 


CTTAGTGACA 


GTGTCTACAC 


CCCTCCACCA 


AGTGTCCCCC 


3840 


ACGGGCCTGG 


AGCCCAGCCA 


CAGCCTGCTG 


AGTACAGAAG 


CCAAGCTGGT 


GAGTGTCCTT 


3900 


GCTTGTAAGG 


AAAACCCAAC 


CTCATCTTTC 


CTTGG CAGGG 


AGATTCTGGA 


GCAGTCCCTA 


3960 


GGGAGGCCCT 


GTGGGGACCC 


CGGCCCCCCG 


GACACAGCTT 


GGCTTCCCCT 


CGTAGGTCTC 


4020 


AGCAGCTGGG 


GGCCCCCTCC 


CCCCTGTCAG 


CACCCTGACA 


GCACTGCACA 


G CTTGG AG CA 


4080 


GACATCCCCA 


GGCCTCAACC 


AGCAGCCCCA 


GAACCTCATC 


ATGGCCTCAC 


TTCCTGGGGT 


4140 


CATGACCATC 


GGGC CTGGTG 


AGCCTGCCTC 


CCTGGGTCCT 


ACGTTCACCA 


ACACAGGTGC 


4200 


CTCCACCCTG 


GTCATCGGTA 


AGCTGGTGGG 


GATGGGTGGG 


CACCTGGGTG 


GGAGGCTCAT 


4260 


GGGGCAACCG 


CANAATCCAG 


GAGCTGGAAA 


AGCCACTGGG 


ACTCATTCAT 


TCATTCATTC 


4320 


ATTCATACAA 


CATGTTAGGA 


GAGGGG AG C A 


GAGAACTGAC 


CCCATGGCCT 


TTGCACTGCT 


4380 


GTGGTACCCC 


AGGG CTCC AG 


GGAACCGCAG 


TTTGACAACT 


TTTGAACAAG 


TCACCGCTTG 


4440 


CTTTTCCCAT 


TAG CTTAGAC 


AAAGAGCTAA 


AGGCTCAGAG 


AGGGGGAATG 


ACTTGCCAGA 


4500 


GCCACTTAAA 


TTAGTGGCAG 


GTCCCAGTGG 


AGGGCTGTTT 


CCTGACCACC 


TTGCCCCTTC 


4560 


TTCCAAAC C A 


CGGGCTCTGG 


GAAGGAGAGG 


TGGTGCCCTT 


GGGAGGTCTT 


GGGCAGGGGT 


4620 


GGGATATAAC 


TGGGGGGCCC 


AGCTGATTCC 


CTCCCCTTCC 


ACTCCAGGCC 


TGGCCTCCAC 


4680 


GCAGGCACAG 


AGTGTGCCGG 


TCATCAACAG 


CATGGGCAGC 


AG CCTGACCA 


CCCTGCAGCC 


4740 


CGTCCAGTTC 


TCCCAGCCGC 


TGCACCCCTC 


CTACCAGCAG 


CCGCTCATGC 


CACCTGTGCA 


4800 


GAGCCATGTG 


ACCCAGAACC 


CCTTCATGGC 


CACCATGGCT 


CAGCTGCAGA 


GCCCCCACGG 


4860 
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TGAGCACCCT 
AGGGGAAAGG 
TCCGTGATTG 
TGTGCACGAC 
CATGCATTTG 
AATCTATTCA 
TGTTC AG CAG 
GGCACGTTTG 
CGAGGTGGCC 
CACCAACCTG 
TGGCCCTCCC 
TGCTCTGCTC 
CACGCCGGCA 
GCACCTGCAG 
CCCCCTCCCT 
GTCACTGTGG 
GTGGAAGGGT 
TCCATGGGCG 
GTGTGACTTT 
TCCTGAGTAC 
CAGCCTTGTT 
CCAGCAATGG 
CCACCCAGAT 
CCTGCTTGGG 



GTGCCCCACA 
GGTGCCTGGC 
AGGGTGTCTG 
TGCTTGTGTG 
TGTGCATGCC 
GACCAGTTTT 
GCCCCATGCC 
CCACGTCTGC 
CAGTACACCC 
AGCGCCCTGG 
TCGGCCTGTG 
CCCCAGGTCT 
TCTCAGGCCA 
CCGGCCCACC 
TACTGTCCCT 
GG CTGTGC AT 
GGGGTGGCTT 
GCCGTGGACC 
GGGGTTCCTG 
CCCTAGGGAC 
TGCCTCTGCA 
CCAGAGCCAC 
GGCCTCTTCC 
GGGT 



CAGCAGGAGA 
AGGCATTGCA 
CAGGCCAGTG 
AGCAGATCCC 
TGTGTTTCTC 
GAAAATCAGC 
CCCCTTTCCC 
CCCTCTCTCC 
ACACGGGCCT 
CCAGCCTCAC 
ACAGAGCCCC 
TCACCTCAGA 
CCACCCTCCA 
GGCTCAGCGC 
GCCCCCTTCC 
GCAGCAGGCC 
CCATGAATCC 
CTGGCTGGGA 
TTATGTG CTG 
AGGCAGGTGG 
GTGTCCTCCA 
CTGCTGCCAT 
TCCCAGTAAC 
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TGATGATAGA 
GTCTGCATGT 
TGTTCCCATG 
TAGTGCGTGT 
TGAAACTCTT 
CTTGGATCTC 
CAGTCTTGAG 
CCTGCGGCCA 
GCTCCCGCAG 
GCCCACCAAG 
TCACCCCCAC 
CACTGAGGCC 
CGTCCCCAGC 
CAGCCCCACA 
ATGTTGGTCC 
TAGGGCTGCT 
AGTGTTCACA 
GGCTCCCTTT 
TGATCCAGGA 
GGTGGGTGTG 
GCAGCCTGGT 
CCAACCACAG 
CACGGCACCT 



GGTTGGCTGT 
GTCTCTGGGA 
TGAATGCACG 
CTGGGTGTGT 
AGGGC GAT AT 
CAACTGCTGC 
GCCTGGGACT 
GCCCTCTACA 
ACTATGCTCA 
C AGGTAAGGT 
ATCCCCCGGG 
TCCAGTGAGT 
' CAGGACCCTG 
GGTGAGAGGC 
CACCCCTTCT 
GTGAGGAAGC 
GTAAGATGTA 
GTTAAGAACC 
GGTGTGGCCC 
GGTGCCTGGT 
GCTGTACCAG 
CGTCATCGAG 
GGGCCCTGGG 



CAATGGATGC 
CAAGTGTGTT 
TATCTGTGTG 
ATCGGTTGTG 
GAATTTCTAA 
CCAGTCTGGC 
AGGGCTGTCA 
GCCACAAGCC 
TCACCGACAC 
CCAGGCCTGC 
CTCAGGAGGC 
. CCGGGCTTCA 
CCGGCATCCA 
CCTGGCTCCA 
GTTGCTGTCC 
ACTGGCAGGC 
CTCAGGCCAG 
GAGGGTAGAG 
TGCCTCCCCA 
GGGTGGCTAG 
AGCTCAGACT 
ACCTTCATCT 
GCCTGTACTG 



4920 

4980 

5040 

5100 

5160 

5220 

5280 

5340 

5400 

5460 

5520 

5580 

5640 

5700 

5760 

5820 

5880 

5940 

6000 

6060 

6120 

6180 

6240 

6254 



(2) INFORMATION FOR SEQ ID NO: 127: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 631 amino acids 

(B) TYPE: amino acid 

( C ) STRANDEDNES S : 

(D) TOPOLOGY: linear 
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(Xi) SEQUENCE DESCRIPTION: SEQ ID NO: 12 7: 

Met Val Ser Lys , eu ser Qln Leu Qln ^ a ^ 

10 15 

^ G " ^ tl" ^ L - G1U 2- - ». «- Ala ,eu Gly Glu 

Pro Gl y p ro ^ Leu Leu Ala Qlu Qiy ^ ^ ^ G ^ 

40 45 
ser Cy s G ly Wy aly „ Slu ^ ua ^ ^ 

" 60 

«y «. Tte arg «, s„ Glu „ Glu Thr A=p ^ ^ Qiy ^ 

«- T te »„ Pro „ e ^ u Lys „„ ^ ^ ^ ^ ^ ^ ^ 

Ala A1 * H1B So L1 " "* s; Thr - »-» - «■ «■> -o 

*p « 3 v., Ma Lys Mec val Ser ^ ^ ^ 

1-20 125 
Pro Cl„ ^ „. w v,! Asp », Thc oly G1 „ ser ^ u 

A " 140 

«jr „„ B1 . ^ to gly Thr ^ ^ ^ ^ ^ ^ 
»• Le„ Thr J, Tyr W1 ^ Ly . ^ ^ Ma ™ 

- Th, «. «. Sly 01 „ Gly 01y Mu ^ ^ 

185 190 
«- — P„ Tte Ly . tys Gly ^ ^ ^ Tw ^ 

200 205 

«. «„ 01 „ a. _ Gln Ma ^ g1u Arg Gin ^ ^ ^ 

4,215 220 
Ser Lys oiu Olu Arg G1 Thr Leu ^ ^ ^ ^ 

235 

or. xi. 3i„ Arg „ y v>1 set Pto ser ^ Ma Gjn My ^ My ™ 

Asn Leu Val Thr Glu 
260 

-ys i. „. u . Phe Ars H1 . Leu A1 , Mee ^ ^ ^ ^ 



245 250 

4250 255 

Val Arg Val T yr Asn Trp Phe Ala Asn Arg Arg 
265 270 



280 285 



Pro Pro Pro Gly Pro Gly Pro Gly Pro Ala Leu pro ^ 



Ser 
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290 295 300 

Pro Gly Leu Pro Pro Pro Ala Leu Ser Pro Ser Lys Val His Gly Val 
305 310 315 320 

Arg Tyr Gly Gin Pro Ala Thr Ser Glu Thr Ala Glu Val Pro Ser Ser 
325 330 335 

Ser Gly Gly Pro Leu Val Thr Val Ser Thr Pro Leu His Gin Val Ser 
340 345 350 

Pro Thr Gly Leu Glu Pro Ser His Ser Leu Leu Ser Thr Glu Ala Lys 
355 360 365 

Leu Val Ser Ala Ala Gly Gly Pro Leu Pro Pro Val Ser Thr Leu Thr 
370 375 380 

Ala Leu His Ser Leu Glu Gin Thr Ser Pro Gly Leu Asn Gin Gin Pro 
385 390 395 . 400 

Gin Asn Leu lie Met Ala Ser Leu Pro Gly Val Met Thr lie Gly Pro 
405 410 415 

Gly Glu Pro Ala Ser Leu Gly Pro Thr Phe Thr Asn Thr Gly Ala Ser 
420 425 430 

Thr Leu Val He Gly Leu Ala Ser Thr Gin Ala Gin Ser Val Pro Val 
435 440 445 

He Asn Ser Met Gly Ser Ser Leu Thr Thr Leu Gin Pro Val Gin Phe 
450 455 460 

Ser Gin Pro Leu His Pro Ser Tyr Gin Gin Pro Leu Met Pro Pro Val 
465 470 475 480 

Gin Ser His Val Thr Gin Asn Pro Phe Met Ala Thr Met Ala Gin Leu 
485 490 495 

Gin Ser Pro His Ala Leu Tyr Ser His Lys Pro Glu Val Ala Gin Tyr 
500 505 510 

Thr His Thr Gly Leu Leu Pro Gin Thr Met Leu He Thr Asp Thr Thr 
515 520 525 

Asn Leu Ser Ala Leu Ala Ser Leu Thr Pro Thr Lys Gin Val Phe Thr 
530 535 540 

Ser Asp Thr Glu Ala Ser Ser Glu Ser Gly Leu His Thr Pro Ala Ser 
545 550 555 560 

Gin Ala Thr Thr Leu His Val Pro Ser Gin Asp Pro Ala Gly He Gin 
565 570 575 

His Leu Gin Pro Ala His Arg Leu Ser Ala Ser Pro Thr Val Ser Ser 
580 585 590 

Ser Ser Leu Val Leu Tyr Gin Ser Ser Asp Ser Ser Asn Gly Gin Ser 
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595 600 605 

His Leu Leu Pro Ser Asn His Ser Val He Glu Thr Phe He Ser Thr 
610 615 620 

Gin Met Ala Ser Ser Ser Gin 
625 630 

(2) INFORMATION FOR SEQ ID NO: 128: 

(i> SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 6433 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 128: 

CATGAACCCC GAAGAGTAGT GTCTTCTCTC TGGACTAAAG CGGAACTGAG AACCGGTGGA 60 

AAAGCCCCGC GCCTAGGCTG CAAGGCACTG GCTTAACAAG TCCAAAGGTT AGGTGAAGTT 120 

TGGCTGATAA GCAGAACCAG TAAAAGAAGG TCTCTAGCCC CCCAGCGTGA GTACAATGGA 180 

CCCTGGCAAA GCCCCGCTCC CGGCCCAGGT CTTCTGCTCT CCAGGTCTGC CCCTCCGGCT 240 

CTCCCTCTCT CCGGGTTTCC CCCTCCCCAC CATCATTTGC ATCCAGCCGA AAGCTGGGCC 3 00 

CTTCCCACTA ATTTGCATAT CTTATATGGC CTAATGGTGG CGATCATGGC AAGTTAGAAG 360 

TTTTCTGACT CCTTTCGGAG GAGCCTCCGG GACCCCGGGG AGTAACAGGT GTCTGGAGGC 42 0 

TGAAGGGTGG AGGGGTTCCT GGATTTGGGG TTTGCTTGTG AAACTCCCCT CCACCCTCCT 480 

CTCTCGCACC CACCCACCCC CTCACCCCCT TCTTTTTCCG TCCTTGGAAA ATGGTGTCCA 54 0 

AGCTCACGTC GCTCCAGCAA GAACTCCTGA GCGCCCTGCT GAGCTCCGGG GTCACCAAGG 600 

AGGTGCTGGT TCAGGCCTTG GAGGAGTTGC TGCCATCCCC GAACTTCGGG GTGAAGCTGG 660 

AGACGCTGCC CCTGTCCCCT GGCAGCGGGG CCGAGCCCGA CACCAAGCCG GTCTTCCATA 720 

CTCTCACCAA CGGCCACGCC AAGGGCCGCT TGTCCGGCGA CGAGGGCTCC GAGGACGGCG 78 0 

ACGACTATGA CACACCTCCC ATCCTCAAGG AGCTGCAGGC GCTCAACACC GAGGAGGCGG 840 

CGGAGCAGCG GGCGGAGGTG GACCGGATGC TCAGGTAGGC GCAGAGCCAG GTGGAGGGGA 900 

CCCACCCGAA CCCCTGGAGC CCCGGCCCCG GGCCTGAGTG ACACTGCGCC CGACCACACT 960 

CGCCAAGCCC GTTTCCCACC AAAAAATTCC CCCGGGGGGC GCTCTGCTTC TCTCCCAACA 1020 

CCCGGACCCT TCCCAATCCC TTAGCGGGAC AACC CTGCGG CCCACCGGGC TTCTTCTCCC 1080 

CAGGCCCAGG CCATCGTCCT CAGAAGAAAG GGATGAGGTG TAC CGTACAG GGGCAGTCAC 1140 
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CTTCTCCTCT 
GGGGGGAAAA 
TCTCCCCAGA 
TTGGAGGGCT 
GGTGGTCGAT 
CCCTATGAAG 
GATCCTCCGA 
TCTCACCCCA 
AGATATGAGG 
TCTGTCATGT 
TCAGTGTGAT 
GCTTCATCTC 
AGCTGGCTAT 
AGGTGGG C AG 
GAAGGTTGCA 
GTCTGTCTGT 
TTTCAGAATT 
ATCAGCTGCT 
ATGATGCCTG 
CCGCGTCCCA 
AGAGAGAGGC 
GTGGGCAAGT 
TATACAGATA 
TTTTGGGCCA 
CATTCCCAGG 
CTCCAACTTG 
GGCATTCCGG 
CCCTCTGCTC 
GTCAGGTAAG 



GTTTAGCTTC 
TTCAGAATTT 
TGTCTCCCAC 
GCTAAAATGA 
GTCACCGGCC 
ACCCAGAAGC 
CGTAAGTGTT 
TTGGCTGCCT 
AAGGTGGCAA 
TCTTCAGTCA 
TGAGCTCACC 
TGAAGGCCAG 
GTTTGAGCTC 
TAAGGGCCTG 
AAGCTTAGTT 
CTGTCTGCTG 
CAACCAGACA 
GTTT CTCTTT 
CTCTGAGCCC 
GCAAATCTTG 
CTTAGTGGAG 
ACACAGACCC 
AGTGTGGCTA 
AGCACCAACA 
GCAGAATGTT 
GTCACTGAGG 
CAAAAGCTGG 
TCCCACGGCT 
CAAAGGTTGG 



CATTTTGGCC 
TGCATAGACC 
TAGTACCCTA 
TCAAGGGTTA 
TGAACCAGTC 
GTGCCGCTCT 
TTCATCCTGC 
CAGTTTCCCT 
GTAGATTTGG 
CAGCCCCCTT 
CACTTGACAT 
TGAGCCAAAG 
CTTCAAAGAA 
TGCTGAGGGC 
AGACGAGGGG 
AGTGAAGGCT 
GTCCAGAGTT 
CCAGAGTTCA 
ACCAACAAGA 
TACCAGGCCT 
GAATGCAACA 
AGGAACCCTC 
AATCAGAGCT 
AGTCCCCCCG 
TGCAGCGAGG 
TCCGTGTCTA 
CCATGGACGC 
CCCCCCACCA 
GCCTCACTGC 



231 
TCATGTCTAC 
ATAGGTAGCA 
ACCATCTGCT 
CATGCAGCAA 
GCACCTCTCC 
GTACACCTGG 
CTCTGCCTCA 
TTCATCGACA 
CCTTGGTGGT 
GCTACCCAGC 
CAAATACAGG 
GGGAAAAAAT 
AGGAAAAGGG 
TCCCCATCTC 
AATAAACCTG 
ACAGACCCTA 
CTGGAAATAT 
GTCAACAGAG 
AGATGCGCCG 
ACGATCGGCA 
GGTAAC AC C A 
CCCTCGGTCC 
TCTCAAAGTA 
CCCCCCTTCA 
GGTGTCCCCC 
CAACTGGTTT 
CTATAGCTCC 
CCAGCCCAGC 
CTCGGCAACC 



CCCAAAGTTG 
CCCCCTAGAA 
TGTCTGTCTA 
CACAACATCC 
CAGCATCTCA 
TACGTCAGAA 
ACCTGAAGTG 
AGGCCTTGTG 
TGCTGTACAA 
CAGTTGCTCT 
AGTTCAGGAT 
AATAATTTTC 
TGGCTTTGCT 
CAGCTCCACA 
TCTTCGTCCG 
TCAAATCTAC 
GACAGACAAA 
CCATGGGCCT 
CAACCGGTTC 
AAAGAACCCC 
CCAGAAGCTC 
TGGGATATTG 
TGTTCCACAG 
CTCACCATCT 
TCCAAAGCCC 
GCAAACCGCA 
AACCAGACTC 
TCCTCTCCTC 
CAACCATCCT 



TAGCTTAGAT 
AAAGAATGTT 
GTGAGGACCC 
CCCAGAGGGA 
ACAAGGGCAC 
AGCAACGAGA 
ACCTTTGCCC 
AG CACTTGGC 
TGGATTGGCT 
GAGGAGCCTG 
GCAGAGTGTT 
TTAAAACTAT 
GGAGCAACTG 
TGCAGTGAGA 
TTGTCTGTCT 
TCCTTTCTCT 
AGCAGT CAGG 
GGGCAGTCCG 
AAATGGGGGC 
AGCAAGGAAG 
AGGTGGGCAG 
AGACACTAGT 
TGATTGTGTG 
CCCCTCCATC 
ACGGCCTGGG 
GGAAGGAGGA 
ACAGCCTGAA 
CAAACAAGCT 
GGTTCTTGCC 



1200 

1260 

1320 

1380 

1440 

1500 

1560 

1620 

1680 

1740 

1800 

1860 

1920 

1980 

2040 

2100 

2160 

2220 

2280 

2340 

2400 

2460 

2520 

2580 

2640 

2700 

2760 

2820 

2880 
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ACGGATCTTA TCTGGTTTAA GGGTTTTCAG AGGAGCAAAC GCTTTTGAGA TGATCCTAGG 
GCCGCTCTCT CATTGCCAGA ATATACTCCC CTGGAAATAA TGTGTGGCTC TGATCAGTTC 
CAAGGCACTG GGGATACATC AGTGAACAAA ACAAACGAGA TAAAAATTTC CTGCCCTCGT 
GGCGCTTACA TTCTAGAATT AAATAGAGAA CATGCCATAT TTACCCTGGA GAAAAGCAGC 
CGATATTTCT TGTGGGTGGA CAGGGGAGGA GAAAGCAACT TTATTTTCTT ATTACCCACC 
CTTGAAAACA AGAGGTGCCG AGTCATTGTT CCAGGACCCT GGTGGCACTA ATGTTCCCTA 
CTGGGTTTGT GTTGTTTTGC AGGAGTGCGC TACAGCCAGC AGGGAAACAA TGAGATCACT 
TCCTCCTCAA CAATCAGTCA CCATGGCAAC AGCGCCATGG TGACCAGCCA GTCGGTTTTA 
CAGCAAGTCT CCCCAGCCAG CC^GACCCA GGCCACAATC TCCTCTCACC TGATGGTAAA 
ATGGTGAGTA CACCTGGGCC ATTGTCGCTC TGGAGCTGAT AAGATAAGAG GCAAAACAAA 
CACAACTTCT CACAAGGCCT GCCTCAAACA ATGAACCATT GTAGCCCCAT AGGGGAAAAT 
GAGGGCTGTC CAGAGTCGGA AAGGAGAGGT AGTGCTGGTG ACCCACCCrr TGGCGGGTAG 
AAAACCCAAA GTGATGGGAT TACAGGGGTG AAGCACCATG CCCAGCCAAT AATTGTTATT 
GAGTGAATGA AGGAATGAAT TTGAGAACTA GTCATGCCAA GGAATCGCTA AGTCACATCG 
TGTTGGAAAC TG CTCTTTGT GGTCCAAGTC CACCCATGTT TCTCTTGTTT TTTTCTCTCC 
ATCAGATCTC AGTCTCAGGA GGAGGTTTGC CCCCAGTCAG CACCTTGACG AATATCCACA 
GCCTCTCCCA CCATAATCCC CAGCAATCTC AAAACCTCAT CATGACACCC CTCTCTGGAG 
TCATGGCAAT TGCACAAAGT AAGTTCTATT CTTGGTTGGA AAACCTGGGG GCAGGGAGAA 
GAAGAATGGG AAGCAAATTA ATGTGGTGAA AAATAACTGT AGGTCTCCTT CAAACTCACC 
CACAACTAGT AAATTTGGTT TAACTTCTTT AGTTTCTCAT CTGTCTCCTT AAATCCAATA 
TTTGGATTGT TTAGCCTAAA ACAAGAAAAA ATTGTGGAAT GGATTTGGAT CCTGGTCACA 
GTTTAGCAGC TGTGCATCCT GGGTCAAATC ATTGAACCTA TGACTCTGGG AGACTCTCAG 
GCTTTAATCA GATCTGTTTA ATGCCCATCT CCAACCCACA ACTCATTGTG GAACTTGAGC 
AAGTAAATTA ATATCTCCAA GTCTCCGTTT CTTTACACTT GCCTCCCATG GAATCTCCTA 
TGTAACAGGC TCAGCCCGGT GACTGGGACA TTGAGCGGGG GCTCAAATGA TGGCATCCAT 
CCACCTCTCC TTATCCCAGG AGCTGTCTGT GTCTTTTCCT CTTGCTCCCA CAGGCCTCAA 
CACCTCCCAA GCACAGAGTG TCCCTGTCAT CAACAGTGTG GCCGGCAGCC TGGCAGCCCT 
GCAGCCCGTC CAGTTCTCCC AGCAGCTGCA CAGCCCTCAC CAGCAGCCCC TCATGCAGCA 



2940 

3000 

3060 

3120 

3180 

3240 

3300 

3360 

3420 

3480 

3540 

3600 

3660 

3720 

3780 

3840 

3900 

3960 

4020 

4080 

4140 

4200 

4260 

4320 

4380 

4440 

4500 

4560 
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GAGCCCAGGC AG CCACATGG CCCAGCAGCC CTTCATGGCA GCTGTGACTC AGCTGCAGAA 4 620 

CTCACACAGT AAGGACACGG GCATGTGGAG GGAGGGAGCA CTCAGGACCC TCAGTGGCCA 4 680 

ACCACTTTCC CTCTCTGGGT CTGAACTTTC TCGGAAGTTT ATTGGCTTGG TCACTTTTCC 4 74 0 

CTGCCTATGA TCAACCGACT AAGACAATTT CTCAAGCATA ACTCTTGAGT GTTGCTGTAC 48 00 

CTTTTCTAGT CCTCTTCTCT ACCCCTGAGA TTCCCAGGGA AGGGTTTGAA TGACCTTTGC 4 86 0 

TCCCGTTCCG TACCGGAGGC CTCCCTGGTA GGAAATGTGT TCTGAGAGCA GGTGGTTTCT 4 920 

CCCTCACAGC CAAGCATCCA CATGCTTTCG GGAGTTGGTT ATGTGACTTG GAATTTACAT 4 980 

GAATCTTATG GATAACTAAT ATGAGAAATC CCCACTATAA CCACCAGCCC TTTTATCTAC 504 0 

CTGAGGAGAT GGGAGCTATG GTGTGGGATG GGGGCTCTGT ACCTGTGTCT TTG CCTGTGT 5100 

ATGCACCTTG ATTCTGTCTT CACTCTGTCT CTCCAGTGTA CGCACACAAG CAGGAACCCC 5160 

CCCAGTATTC CCACACCTCC CGGTTTCCAT CTGCAATGGT GGTCACAGAT ACCAGCAGCA 522 0 

TCAGTACACT CACCAACATG TCTTCAAGTA AACAGGTAAT GCCAGCAGGA TATGCGGGGG 528 0 

TTGGGGTGTG GGCAGGGTGT GATAAGGCCA TGGATGTGCA AAGGTTGTGG CAAGCATGGA 534 0 

CTCGGCCAGA AATTATATCC TCTTTGCTGG TTGAGTTGGG CATCATCTCC CTTAGAGAAG 54 00 

CCAAACTAAT GGCCCATGAC CCTGCCAAAT GACACAGCTG AGCACCCTCT CTCCTCTCTC 54 6 0 

TCTGCAGTGT CCTCTACAAG CCTGGTGATG CCCACACACC ACTTACTTCG TGCGCAACAA 5 52 0 

CAAGGACCCT GTTTTCCACA CCATCACCCT CTGGGCAGCT GT CATGG AAA AGCCCAGTGA 5 580 

CCTGACCAGC ACCTGCGAGA GGTCCCTGCT ACCTGACGGA CGTCCTGCTG GCACCTCAGA 564 0 

CAATCCACTC TCAGGAGGCG CAGCCCGAAG CCCAGTTTCC CTTCTATGCA GTATTGCCAC 5700 

AATGCCTCTC CCACGATGTC AAGGACTCCT GTCTGTCCTG GAGGTGGGAG ACAAGGAACC 5 760 

ACCGAAGAGG AAGCAAGAAA GCCGTACTGT CTATGTTGTG ATCCTTCATC GAACAAACTG 5820 

ATGCGAAAAC TTGAATCTGT TACTGAAATG AGGAGAGAAG GACATGTGCT ATTGAACTGA 5880 

GCCAAACACA CTGTAAATAT CCACAGACTC CCTCCCCTGC CCCCATCCCA CATGATCTTG 594 0 

AGATTTCTTT TAAAGAAGTA AATTTGTCCA ATGGCTGTAA ACTATAAACT ACTGTAATTA 6 000 

AGTGCAATTT CCCCTCTGTG TCCTCTCCCC TCTGCCCTGT ATATAATACT AAAGTGTCTA 6060 

TTAGTTTTCT TTGTAAAGGT CAGAGTCAAA ATTTCAAAAG TGATCTGTCC CCTCTCCCCT 6120 

CATGGAGAAA CATCCTAAGT GGGAAGTGAA GCCCCTTGTC CTCTCCCGCG GG CCTGGAC A 6180 

CTTATGGGGA CAGCATACCT TGGACTGACT ACCAGCTAAC TCCAGTCTCC TGACATTAAG 624 0 

ACACACCTCT GGATCCCTGG AGGGGCTGAA TGTAGTGTGT CAGAGTAACA TGCCAGCTTC 6 3 00 
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CTGTGGG CCA GGAGCTCAGC CTGCACTCCC TAAGAAACCC CAGGGCAGGG AAACTGGCTG 6360 
TTTGATAGCA GAAGAAAAAG TTGCAGTCTC AAAAG CCTTC CATTAAAACA ATTTATTTTA 6420 
TCACTAAAAA AAA 6433 

(2) INFORMATION FOR SEQ ID NO : 129: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 609 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS : 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 129: 

Met Val Ser Lys Leu Thr Ser Leu Gin Gin Glu Leu Leu Ser Ala Leu 
1 5 10 15 

Leu Ser Ser Gly Val Thr Lys Glu Val Leu Val Gin Ala Leu Glu Glu 
20 25 30 

Leu Leu Pro Ser Pro Asn Phe Gly Val Lys Leu Glu Thr Leu Pro Leu 
35 40 45 

Ser Pro Gly Ser Gly Ala Glu Pro Asp Thr Lys Pro Val Phe His Thr 
50 55 60 

Leu Thr Asn Gly His Ala Lys Gly Arg Leu Ser Gly Asp Glu Gly Ser 
65 70 75 80 

Glu Asp Gly Asp Asp Tyr Asp Thr Pro Pro lie Leu Lys Glu Leu Gin 
8 5 90 95 

Ala Leu Asn Thr Glu Glu Ala Ala Glu Gin Arg Ala Glu Val Asp Arg 
100 105 HO 

Met Leu Ser Glu Asp Pro Trp Arg Ala Ala Lys Met lie Lys Gly Tyr 
115 120 125 

Met Gin Gin His Asn lie Pro Gin Arg Glu Val Val Asp Val Thr Gly 
I 30 135 140 

Leu Asn Gin Ser His Leu Ser Gin His Leu Asn Lys Gly Thr Pro Met 
145 150 155 160 

Lys Thr Gin Lys Arg Ala Ala Leu Tyr Thr Trp Tyr Val Arg Lys Gin 
165 170 175 

Arg Glu lie Leu Arg Gin Phe Asn Gin Thr Val Gin Ser Ser Gly Asn 
180 185 190 

Met Thr Asp Lys Ser Ser Gin Asp Gin Leu Leu Phe Leu Phe Pro Glu 
195 200 205 
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Phe ser Gin Gin Ser His Gly Pro Gly Gin Ser Asp Asp Ala Cys Ser 

210 220 
Glu Pro Thr Asn Lys Lys Met Arg Arg Asn Arg Phe Lys Trp Gly Pro 
225 



230 235 



Ala Ser Gin Gin lie Leu Tyr Gin Ala Tyr Asp Arg Gin Lys Asn Pro 



245 



250 



Ser Lys Glu Glu Arg Glu Ala Leu Val Glu Glu Cys Asn Arg Ala Glu 
260 265 



Cys Uu Gin Arg Gly Val Ser Pro Ser Lys Ala His Gly Leu Gly Ser 

275 280 2 

Asn Leu val Thr Glu Val Arg Val Tyr Asn Trp Phe Ala Asn Arg Arg 

290 2 9 5 300 

Lys Glu Glu Ala Phe Arg Gin Lys Leu Ala Met Asp Ala Tyr Ser Ser 
305 310 315 



is Ser Leu Asn Pro Leu Leu Ser His Gly Ser Pro His 
325 



Asn Gin Thr His Ser Leu Asn Pro x,eu Leu =,« — --- 



His Gin Pro Ser Ser Ser Pro Pro Asn Lys Leu Ser Gly Gly Lys Gin 



340 



345 



Arg Leu Gly Leu Thr Ala Ser Ala Thr Gin Pro Ser Trp Phe Leu Pro 
355 360 365 

Arg He Leu Ser Gly Leu Arg Val Phe Arg Gly Ala Asn Ala Phe Glu 



370 375 
Met lie Leu Gly Pro Leu Ser His Cys Gin Asn He Leu Pro Trp Lys 



400 

385 



390 395 



Gly Val Arg Tyr Ser Gin Gin Gly Asn Asn Glu He Thr Ser Ser Ser 
405 410 415 

Thr He Ser His His Gly Asn Ser Ala Met Val Thr Ser Gin Ser Val 

425 



420 



Leu Gin Gin Val Ser Pro Ala Ser Leu Asp Pro Gly His Asn Leu Leu 
435 440 445 

He Ser Val Ser Gly Gly Gly Leu Pro Pro 

450 



Ser Pro Asp Gly Lys Met 

455 460 



val Ser Thr Leu Thr Asn He His Ser Leu Ser His His Asn Pro Gin 



465 



470 



475 



Gin Ser Gin Asn Leu He Met Thr Pro Leu Ser Gly Val Met Ala He 
485 490 495 

Ala Gin Ser Leu Asn Thr Ser Gin Ala Gin Ser Val Pro Val He Asn 
500 505 510 
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s« v.. » OJy _ Leu u , Mu ^ ^ ^ 

r>-i U 



525 



«» J- His S. r p ro His g 01 „ „„ ^ ^ ^ 

^ b 540 



sis 8is M " "* G1 " J; -«> «- *. «. «. v,» Thr 01n _ Gln 

555 

«- S „ Hi. Met T£ M . Hi , Lys ^ pro ^ ^ ^ 

Th, S . r _ jj. Pro ser Ma ^ ^ ^ -™ ^ 

585 590 
^ 4 " ~ ~ Ci„ Cys «, ,eu Ci„ „, 



600 605 



Trp 



(2) INFORMATION FOR SEQ ID NO: 130: 

<i) SEQUENCE CHARACTERISTICS • 

(A) LENGTH: 10014 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(Xi) SEQUENCE DESCRIPTION: SEQ ID NO: 130: 
TGGGTTGCCT GTGACTGCAC TGGCGATACC CCCACAAAGC CCACTCTGAA GGTAGGAGAC 
GGGTGGAGAG AAACAGGGGG ATGGCAAGGG GGATACGAAA CAGGGAGAGG GAGGAGGGGG 
AAGAGGATGG ACGXCTACCA GGCCCCACTT GGTGCTTGAT TTATGCCATC TCATTTCCTT 
CTCAAACCAC CCTTTGAAGT TGATTGTACA TTTTACAGAA AAGGAAACTG AGGCTCGGAG 
AGGAGAATCA TTTACCCAAG GTCCCAGTTA GTAGACGGTA GGTGCCTGAA TGTAAATCCA 
OGTCTCTGCC TGCTCCGGGA GGGGGTGGGG GTGAGGGAAA CAGGAGAAXG TGATGGGAAA 
ATCCGAGATG GAGCCAGCCT GGGCCAGAAA CACTGGGAGC TGTGGGAGAC GGAGAGGGGC 
AGGGTGGGAT CACAGGGAGC AGGAGCGGGG AATTGGAGGT GAATCTGGCC CTCCCAAACT 
TCCAGTCCAT TCTGCTCCCA GGGGAACCGG GAAACTGCGG GGGAACTGGA AGGGAGCTCC 
CAGAACAAGG ATCCAGAAGA TTGGCATCTG GGGCCTGGGA TTTAGGTTTC TAAATCGTGG 
GCCATGGGGC AGCCTTATCT CTGCAAAAGC ATTGAGGGTA GAAGTCAATG ATTTGGGAAG 
TTATTGAATT AGGGGATCTC GGAGGTAGGC TGTCAGTGCC TGATAGTATC AGTTAGAATG 
CCTGACTTGG GGTGACAATG GCTTGGAGGG GTGGGTGAGT CAAGGGTCAA ATGAGTGCCC 



60 
120 
180 
240 
300 
360 
420 

460 

540 

600 

660 

720 

780 
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GVGAGTCATG ATGCCTGCCT TGTACAATTG ATAACTGAAC ATCGGTGAGT TAGGGCCCCA 840 

GCAGTTGTAA TTAGCACCCC GGGTGTCAGC CAGAAACCAA CAAACAGCCA AATCCCTGCA 900 

GCCCCGCCCA GCCTATCCAC CGGCGGGGGA CCGATTAACC ATTAACCCCC ACCCCTCCCC 960 

GGCAGAGCCT CCACCCCTTC ACAGAGGCTA GGCCAAGACT CCCAGCAGAT CTTCCCAGAG 102 0 

GACGGTTTGA AAGGAAGGCA GAGAGGGCAC TGGGAGGAGG CAGTGGGAGG GCGGAGGGCG 1080 

GGGGCCTTCG GGGTGGGCGC CCAGGGTAGG GCAGGTGGCC GCGGCGTGGA GGCAGGGAGA 114 0 

ATGCGACTCT CCAAAACCCT CGTCGACATG GACATGGCCG ACTACAGTGC TGCACTGGAC 12 00 

CCAGCCTACA CCACCCTGGA ATTTGAGAAT GTGCAGGTGT TGACGATGGG CAATGGTAGG 126 0 

TGGGGGCAGA TGTGCCCAGG TGTGCCAGTG GGGGCAGGTG TGCCTGGGTC CAGGAGCAGA 132 0 

TCTTTGGCAC TCAACTTTGG GGTGGGAGGA GAATGATACA AAATGGTAGG TTGGTCCTAC 1380 

AGGCCAGCAC AGGTGTTGCC AAGTGAAGCC CATGTGCCCA GGCACAGTGA TCACAGGCAT 144 0 

TCTGGGTGAA GGGAGGCCTG CAAGGGC CAA TTTCCAGCAA AAGTCGATCC CGGCTATTCC 1500 

TCCCAGGCCC TTCCAGTCCT CACTGCCTCA CAGTGGCTCT GCTTGGCGCT TGGCACAGTG 1560 

ACATGATGGT GAGCTCCCCC TTGGTGCCCA GCTCCAGCGA TTCAGCCCAG CACGGCCCCT 162 0 

TCGTGAACCC CTTGGGCCTA GGTTCAGAGA GACGGCAAGG GATGTTGTAT CCCTGGAGAT 1680 

GGTGGTTGGA GACATAACCG CATTTCTCGG TGTCTTTGGG ACTTTCCTAG GGAAATGAAA 174 0 

TTGGCACTTA GGG AAAATGG AGCTCTCAGG GAAGTTTTGC TAACTACGAA GCCAACTCAG 1800 

CACTGTGTGT GTTGTGTGTG CGTTCGTGTG TGATAGTGAG TTTCCATGTA GGTTGTATGG 1860 

GTGGGGTGAT GCCTTCAGGA ACCCATTTGC ATATGTGTGT TCATTTGTCT CTGTGTGTGA 192 0 

GTTCTGGGTC TATTTTCCTT TGTATTCATT GAGTGGGTCT GTGTTTGTGT CTTAGGAGTT 198 0 

GCCCGTGTTG ATCTTGCTTA TGTATGTAAG TGTGTATGTG TGTGTACTTG TGTCTGTGGA 2040 

TGTTTGTACA TGTGTGCTGT GTGTGCGGGT CATAGAGCAC ATGCGTTTGT GCATGCGGAC 2100 

CTGTTGGAGT GCCCTGTTCT TCCTGCATCT TTATCCTGTA TGGGCGTTTT GTCGTGTGCC 216 0 

CATATTTGTA CCTGCTGTGT ATATATGCAG TTCCCTGTGC TGCGGGCGGG GGTCAGCGGT 2 220 

CTCTGGTGTG CACGACTGCA C AG AC CC AAA TGCAGGACTC TGTTGTTGCC ACTCACCAAG 228 0 

TGAGATTCAT ATCAGCAACA TGTCCGTTTG TCTCTGAGCA GATTTTGTTG CCGCTGCGTC 234 0 

TCGCCAGATT GAGGCATCCC CTCCGACATC ACTGGAGCAT ATCTGGAGGG GTGGACAGTT 2400 

CTCCACAGGG AGGTAGGGGA AAAGAGGAGG CCCGGAAACC CCTCCTGGAG GGAAGAGCCC 246 0 
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CATCGGTCCC 


AGGCCAGCCT 


CAGAGGAGAG 


GGGGCAGGCA 


GCTGGCTGAG 


GTCAG CCTGC 


2520 


CACCCTGCTT 


CCTTCTGTGT 


CTTGGAGCCA 


CTCAGCCAGT 


ATGAGG CTGC 


AGCTCCAGCT 


2580 


GAGGTCTGGA 


ATCTTGTGGT 


CAGCTCAGCT 


AGGGTGAGGA 


GGCAGCTGCT 


GGGCACTGCT 


2640 


TGTTGTCAGC 


TCAGCAGGTG 


CTCACCTGCC 


CCTGCCGTCC 


AGTCACGTGT 


GACCTTGGGC 


2700 


ATGTCACCTC 


CCCTATCCTG 


GCTTCTGTAT 


CTTCTACAAA 


ACAGGCTTCA 


TTCCCCCAGG 


2760 


CCTGCTGGCT 


GGACGGCTTT 


TAGGCCTGTC 


TGAGGACCAC 


GCCAGGAG CG 


CAAGGCAAAA 


2620 


ACACACCAGA 


GATCCCCTTG 


CGAGTTAGGA 


GGCCGGCTCC 


CACCCCAGAA 


GGTGGCCAGG 


2880 


TTTTCATGCC 


TTCCTAGAGA 


AAGCTGGGGC 


TGGTGGCCTC 


CACCACAGGG 


AGACGCAGAC 


2940 


CCTCAGAAAC 


AAGTCTGTGA 


AGTCACAACC 


AGCCCCAGTT 


TACAGATGTG 


AAACTGAAGC 


3000 


TCCAAAAAGT 


CAGGAGGTCA 


CTGAGTGGGG 


AGGTGATGGA 


GTGGGAACAG 


CCCCCAGATC 


3060 


TGGCTGAGGC 


CGAAGCCCTG 


GAGAGATCCC 


CGCAAGGCTC 


CCTTAGATGC 


CTGACATTCT 


3120 


GCTCTTC CTG 


AAGCCTCACT 


CCCTTCTCTC 


CTGGCGCAGA 


CACGTCCCCA 


TCAGAAGGCA 


3180 


CCAACCTCAA 


CGCGCCCAAC 


AGCCTGGGTG 


TCAGCGCCCT 


GTGTGCCATC 


TGCGGGGACC 


3240 


GGGCCACGGG 


CAAACACTAC 


GGTGCCTCGA 


GCTGTGACGG 


CTGCAAGGGC 


TTCTTCCGGA 


3 3 00 


GGAGCGTGCG 


GAAGAACCAC 


ATGTACTCCT 


GCAGGTGAGG 


AGCCTCAATT 


TCTTCAGCTG 


3360 


GGAAATGGGC 


ACACTTGGGC 


TCATGGCCCC 


AAGGTCTGTC 


TTCTCCCTGA 


GTGGGTAGGT 


3420 


CCCAGAGACA 


GCTGCCCTTC 


AGGGCCTTCA 


AGGCTCTTCT 


GGTTTTGTAA 


AAGACTTTGT 


3480 


GAATCCAAGA 


AGAGCATCTA 


TTCTAGGAAC 


CACATTTACT 


GATCATCAAG 


CTACTGGCTG 


3540 


CCGTTTATTG 


AGCTCTTATC 


ATATGCCAGG 


CACAATACTA 


AGTCTTTGTG 


TGTATTTACG 


3600 


TACTCCAGAG 


GTCAAGGTTC 


CCAACTCAGC 


TCTAACACCA 


ACCAGCAGAG 


CGACCCAGGA 


3660 


CCACATGTTG 


CCTCTCTGAG 


CCTCAGTTTT 


CCCATGTTTA 


GCAGGACAGG 


ACTGGGCTCT 


3720 


TAGAGAGTTC 


ATAGCACCTT 


TCCAGCTCCT 


GGTGGGTTCA 


AGAGAGAACT 


CCCGGGATGA 


3780 


AGAGATGAGA 


GCACTGAGGT 


TGGGGGGTCA 


ACTGGATAGC 


CAGGGCCCTA 


GTTCTGTCCT 


3840 


AAGAGGAGGA 


AGTTGTGTCT 


TCTCCATCCA 


ACCATCCAAA 


GCCCTCCCCA 


GATTTAGCCG 


3900 


GCAGTGCGTG 


GTGGACAAAG 


ACAAGAGGAA 


CCAGTGCCGC 


TACTG C AGGC 


TCAAGAAATG 


3 960 


CTTCCGGGCT 


GGCATGAAGA AGGAAGGTGA 


GCCTCGGCCC 


TCCCCGCCCC 


ACCACCACTG 


4020 


CCCCACCTGC 


ACCCACAGCT 


CCCCGACAGT 


CATTTACAAC 


TGTAGCCACA 


CTTTATGACT 


4080 


CAGTGGCAGG 


CCCCAGGGTG 


ACTGGCTAAT 


GGCTGAGAAG 


AGGGAGGGCC 


TGGAAATCTG 


4140 


ACCATAGGGA 


GCGGCTGGGC 


TTGGTCTTGA 


GAAAGATTCT 


CCCACTCCTC 


ATCAGTCACA 


4200 
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GACACCCCCA CCCCCTACTC CATC CCTGTT CTCCCTCCTC ACCTCTCTGT GCCTCCTCAC 4250 

CCGTCCAGAA TGAGCGGGAC CGGATCAGCA CTCGAAGGTC AAGCTATGAG GACAG CAGCC 4 320 

TGCCCTCCAT CAATGCGCTC CTGCAGGCGG AGGTCCTGTC CCGACAGGTA CCGGGGTGAT 43 80 

CCTGCCACCC ACCCAGGGAT CCCCCACACT ACAGAGGAGC TCACCTCCTC CACCTCCATT 4440 

CTCCCCAGCC AGGCCCTGGA GCAGCTGACG GGAGGGGCCT CAGATATTAC AGAAGGGACA 4500 

CTGAGTGCGG TTTCACATGG CCCAGTTTGC AGCAAGGGCA GGAATCGAAC CTGGCGCCCT 4560 

GGGGCACTTT CTAATTCATC CTACTGCCTG CATCCCACAG GCCAAGCAGA GTCTTCACCT 4620 

TCACTGAGGG CCTGCGATCA GCTCAGCTCC GAGAGAACAG AGCAGTGGCT CAGTGGAGAG 4680 

AGGTGGCAAA GTGGGGCCCA GCCCTTCCCT TGCTGAGTGA CCTTGGGCAA GTCACAGCAC 474 0 

CTCTCTGAGC CATGGTTGCC TCATTGTCAG AAAAGGATGA TGATTTTTTG CCCTGCTTCT 4800 

CCTCTAAGG C TGACAGACTC CTTGGGGCTC TAAAGCTGTT CTCCCTCATC CCTGCCTCCT 4 860 

CCCTCCCTCC GTTTTTACCC TGAGCTTCCT TCAGAGCTGG AGGGCACCCA CTATCCAGCC 4 920 

CCCTCCCCAC ATCTGATTCC AGGGAGGGGG CTCTGTGCAG GGGACAGAGA ATGCGGGAGG 498 0 

GCCCGGACAT CTCCAGCAJT TTCTTCCCTG TATCTCTCGA AGATCACCTC CCCCGTCTCC 504 0 

GGGATCAACG GCGACATTCG GGCGAAGAAG ATTGCCAGCA TCGCAGATGT GTGTGAGTCC 5100 

ATGAAGGAGC AGCTGCTGGT TCTCGTTGAG TGGGCCAAGT ACATCCCAGC TTTCTGCGAG 5160 

CTCCCCCTGG ACGACCAGGT GAGGATGGGC GTGGATGGTG GGCAGTAGTG GGCAGTGGGC 5220 

GGGGCAGCCA GGGGGCTGCT GGCCCACCTG GGATATAGCC GTGGACTGGC TTGATTTTAT 5280 

TTTATTTAAC AAAATATGTA GTGCACACAC GTGTCTGAAA CTTTAAATCA CCTTACAAAT 5340 

ATTAACTCAG TTAG CTCCTC CAACAACTCT ATGAGGTAGG TACTAAGGTA CTATTATTAC 5400 

TGCCATCTCA TAGGTGAGGA GATTGGGGCA CAGAGAGGTT AAGTAACCTG CTCAAGGTCA 5460 

CATAGCTACT ATCCAGCATA GCTGGGATTT TTACAAAGCA CCCTTCATAA TTCTCCATAG 552 0 

CTGGTCCATG GGTGGGAATT TGGGACCCAC AGTTTTGGAA CTTTTTGGGA TCATAGACCT 558 0 

TTTTGAGAAT CTCAAAAAAG AAAAAAAAAG CACACAGAAT GTTGCTTACA GTTTCATCAG 5640 

GCACACAGAA GAGGCCCAGC ACGAAGCAGT TTCTTGCCCA AGGACACAGC AGTTCAAGGA 5700 

CAGAGTCAGC GCGAGGTCTC TCAGCTCTGA GCACATGTTC TTTCCCCTTC C AGGTTT CTA 5760 

GTTTTATGGG TAGTAGTTTT ATGATGCCCA TTTCACAGTT CAGGCAGGTA GAGGCAGAGG 5820 

GGAGCATTAA GCTGACTTGC CCAGCGTCAC TGAGTTGGCT ACGGGC AG C C TTCCCAAGGG 5880 
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TACAGATGGC AAACAGTGTT CCTTCTCTCT TTCAGGTGGC 
GCGAGCACCT GCTGCTCGGA GCCACCAAGA GATCCATGGT 
TAGGTGAGGC GGCTGCCTGC CCTGGCCAGG GCTCCAGGGA 
TCACCCAGGC AAGGAGATTC ACATGGTGGC ATGCAAGGGT 
GGCCCTGTCC TCAGGCTTGC ATTGGAGGGC TCCAGGACTC 
ACTCAGATGC AAGGAAATGT GGATGCAAGT CACCAAATTC 
CGATCAGGGT TATCCCTGGA ATTACCTGTG CATCCTTTTT 
CTGTCACTCA GGCTGGAGTG CAATGATGTG AGCAAACACT 
GCTATGAGGG AGCTCGATTA TTTATCCTCA TCTTATAGAT 
AGGTTAAGTA ACTTATCCAA CTATAACCAG CTATCAGGGG 
AGTGCAGTTC CAGAATCTGG TCCTTTAACC TTGATGCTTT 
TGAATGTCAT CGATCTTGTG AGTCATGTTG GTAAATGGAG 
TCCTAGAAAG CCAAGTTCCA AGCTCAGCCG GATGACTCAA 
TGGGCCTCAG CTTCCTTACC TGTGAAATGG GAGTCACCAT 
CAGGCACCAG CTATCTTGCC AACTTAAAAG CCAAAACTAG 
GTGACTTCCC ATCCTCCCTC CCTCCCAACC CTTCCAGGCA 
CACTGCCCGG AGCTGGCGGA GATGAGCCGG GTGTCCATAC 
CTGCCCTTCC AGGAGCTGCA GATCGATGAC AATGAGTATG 
TTCTTTGACC CAGGTACAGT GCACACCTCC TAAGCCATCC 
CTCTGCCAGA CTTCTCCTAT TGGGTTCTGT ACACTGAGTT 
ACGACAGCCA GGAGAGGCCG TTTTCATTTA ACAGATGAGG 
CAATATGGCC GGGCGCAGTG GCTCACACCT GTAATCCCAT 
GGCGGATCAC CTGAGGTCAG GGGTCAAGAT GAGCCTGGCT 
CTACTTAAAA GTGGCTCTGC CAACAACTGG CTGTGCGACC 
CACTGTGTCT GGGTTTCCCC GTGTGTAAGA TGAGGCGGTT 
ATTCCTCAAG TCCCGCCCTC CATCTCCTAT TCCCCTCTCT 
AATGTGGCAG AAATCTTTTT CTGCCTGTGT CTAGGAAATC 
CTGGTTGTTG AGGTCCCTGA ATCCTTGTGC CCACACTGCT 
AAGTCAGGGG ACATCTGGGT CTTGACTCCC CAGATGCTCC 



CCTGCTCAGA GCCCATGCTG 594 0 

GTTCAAGGAC GTGCTGCTCC 6000 

GGGTATGCCT AGCATGGCAC 606 0 

GAGGGAGACT AGTCAGGAGT 6120 

AGTTTTCAAC TGGGTACCCC 6180 

CCAGCATTGA AGTCAGAGCA 6240 

TCTTTTGACA GAGTCTTGCT 6300 

ACCTATTTTA ATATAACAAT 6360 

AAGAAAACTG AGGCACAGAG 6420 

CAGAGCCATT TAAGCAGGGC 6480 

GGTGCCTATC AGGTGACCTT 654 0 

CTTGGGTCAT GTGAAAGAGG 66 00 

GGCAGCTTAT CTTCTGAATC 666 0 

CCCTGCAGGT CCTCCTCCCA 672 0 

AGGAGAGGGG TCAACCCAAG 6780 

ATGACTACAT TGTCCCTCGG 684 0 

GCATCCTTGA CGAGCTGGTG 6900 

CCTACCTCAA AGCCATCATC 6 960 

CTGACTCTCT CTCCAGAACG 7020 

CACAGCCTCA TCTCATGTTA 7080 

CAAGTCAAGA TTTGAAGAGA 7140 

CACTTTGGGA GGCTGAGGCG 72 00 

AACATGGAGA AACCCCATCT 7260 

CAGGACAAGT CCTATCTTTG 7320 

GCTAGGTGCT TATTGGATGC 7380 

T CTGGTTT AG TGCTTTAGGA 7440 

ATAATTCATG CTGGCGTACC 7500 

GAAGACTCCT TGTGTGACAC 7560 

AGCTGGACCC TGCTGCCCTC 762 0 
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CCiTGCCCAC CCTCTTCCAT TGTAGATGCC AAGGGGCTGA GCGATCCAGG GAAGATCAAG 7680 

CGGCTGCGTT CCCAGGTGCA GGTGAGCTTG GAGGACTACA TCAACGACCG CCAGTATGAC 7740 

TCGCGTGGCC GCTTTGGAGA GCTGCTGCTG CTGCTGCCCA CCTTGCAGAG CATCACCTGG 7800 

CAGATGATCG AGCAGATCCA GTTCATCAAG CTCTTCGGCA TGGCCAAGAT TGACAACCTG 7860 

TTGCAGGAGA TGCTGCTGGG AGGTCCGTGC CAAGCCCAGG AGGGGCGGGG TTGGAGTGGG 7920 

GACTCCCCAG GAGACAGGCC TCACACAGTG AGCTCACCCC TCAGCTCCTT GGCTTCCCCA 7980 

CTGTGCCGCT TTGGGCAAGT TGCTTAACCT GTCTGTGCCT CAGTTTCCTC ACCAGAAAAA 804 0 

TGGGAACAAG GCAATGGTCT ATTTGTTCAG GCACCGAGAA CCTAGCACGT GCCAGTCACT 8100 

GTTCTAAGTG CTGGCAATTC AGCAAAGAAC AAGATCTTTG CCCTCGGGGA GGCTGTGTGT 8160 

GTGTGAGTAT GTATGGATGC GTGGATATCT GTGTATATGC CCGTATGTGC GTGCATGTGT 8220 

ATATAAAGCC TCACATTTTA TGATTTTGAA ATAAACAGGT AATATGAGGG ACACATAGAT 8280 

GCTATAAGTA GGTCAGTTGG CTGCAGCAGA GATGTGGGGG ATGAGGCTGA AAGGTGAGGC 834 0 

GGGACCAAAT GGTTGAAGGA CTTGCACTCC AAGGAGCTTT GAGAGCCATT GATTACATCC 84 00 

ATTATGTTAC TATGTGACCA ATACATTACT CATTAGAACA TTTACGTGAT CTCAGAGCTT 8460 

CCTTATATGC ACCTTGTTCC TTTCAACTCA CTTTTGTTCT CTTGGTTTTT TGGGGTCCTC 8520 

TTAACACCCT CATGAAGTCT ATAGATGGGA ATGGTACACC CTAGTTTACT AACCCAGGAA 8580 

TAGGTACCCA ACAGGCACTG CCAATATTGG ATGGGCTGGT TGATTGGCCA CGCCTGAGGA 864 0 

AGATGGCGTC CCAAGGCCTG AGGTCTGCAT CCCAGACTCT CCATCCTGAT CGACCTTCTC 8700 

TACCTGCAGG GTCCCCCAGC GATGCACCCC ATGCCCACCA CCCCCTGCAC CCTCACCTGA 876 0 

TGCAGGAACA TATGGGAACC AACGTCATCG TTGCCAACAC AATGCCCACT CACCTCAGCA 8820 

ACGGACAGAT GTGTGAGTGG CCCCGACCCA GGGGACAGGC AGGTGGGCAA ACTCTGGGAT 8880 

TTTACCTTGC AAAGGGTGAG GATGGGGCTT AAGACAGGAG GCAGGAGAAA GTGGAGTCTA 8940 

GAAGGTAGAA CCAGGATGCA ACAGTTTTCT GGGTTCCAGG GTAGGGAATA AAGGGCAAGA 9000 

TTGTCCATTT GTTGAGGCTG TTTATTCAGT AAGGTGACTG ACAGC CTTTA CTGAATGAAG 9060 

CCATTGTTGG GATGAGGCAA TCCACTGGAT GAGGTAACCC ATTGGGTGAA GATGTCTTGG 9120 

GTGAGAATTC CATTAGTTGA CATTGTCCAT TAAGTAAAAG TGGTCATTGA AGTAAGGCTG 9180 

CACAGTTGGG TAAGGCTATC CATTAGACAT TAGATGAGAC TACCCATTGG GTCAGGATGT 9240 

CTGCTGGGCT ATTTGGGAGA AGCAGTCCAA GTCTG CATAT CAAATAAATG ATGGAGGAGA 9300 
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TGGGTG GTAG 

AGGTTGAGCA 

AGGAAGAAAT 

ATTTACTCCC 

ACCCCTGAGA 

CTGCCGGGAG 

ACCAAGCAGG 

GCCCCCTAAG 

ACCAGTCCCA 

CCCACTGCAC 

TGCCTTGGAC 

CAACCCCCGA 



GACCTTCCAG 

ATAAAAGACC 

TAAGTCAAGG 

ACAAAGGCTG 

CCCCACAGCC 

CCGTCGCCAC 

AAGTTATCTA 

AGAGCACCTG 

GAGCAGGAAT 

CCTGACGCCC 

AACTTTCTCA 

CTTCATCCCA 



AC CTC AT AAA 

TTAGGGATTA 

TGGGGCAGGG 

GAATTTTGAG 

CTCACCGCCA 

AATCGTCAAG 

GCAAGCCGCT 

GTGATCACGT 

GGGAAGGATG 

TGCTCTGATA 

TGTTGAAGCC 

AAGGAC AG CC 



242 

ACTTAGG CTT 

TCTGGCTTAA 

TGGGAGGGGA 

CAGCCCCTGT 

GGTGGCTCAG 

CCCCTCTCTG 

GGGGCTTGGG 

GGTCACGGCA 

AAGGGCC CGA 

ACAAGACTTT 

ACTGCCTTCA 

GCCTGGAGAT 



TATGATCTGG 

TTAATTCTCT 

GAACTTTCCC 

CTGTCTGTTT 

GGTCTGAGCC 

CCATCCCCCA 

GGCTCCACTG 

AAGGAAGACG 

GAACATGGCC 

GACTTGGGGA 

CCTTCACCTT 

GACTTGAG CC 



GACTCACAGA 

CATTTTATAG 

GGGGCTCTTC 

GTCCTTCCCC 

CTATAAGCTC 

GCCGACCATC 

GCTCCCCCCA 

TGATGCCAGG 

TAAGGCACAT 

GACCCTCTAC 

CATCCATGTC 

TTAC 



9360 
9420 
9480 
9540 
9600 
9660 
9720 
9780 
9840 
9900 
9960 
10014 



(2) INFORMATION FOR SEQ ID NO: 131: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 567 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS : 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 131: 

Met Arg Leu Ser Lys Thr Leu Val Asp Met Asp Met Ala Asp Tyr Ser 
1 5 10 ~ 15 

Ala Ala Leu Asp Pro Ala Tyr Thr Thr Leu Glu Phe Glu Asn Val Gin 
20 25 30 

Val Leu Thr Met Gly Asn Gly Pro Ser Ser Pro His Cys Leu Thr Val 
35 40 45 

Ala Leu Leu Gly Ala Trp His Ser Asp Met Met He Leu Leu Pro Leu 
50 55 60 

Arg Leu Ala Arg Leu Arg His Pro Leu Arg His His Trp Ser He Ser 
65 70 75 80 

Gly Gly Val Asp Ser Ser Pro Gin Gly Asp Thr Ser Pro Ser Glu Gly 
85 90 95 

Thr Asn Leu Asn Ala Pro Asn Ser Leu Gly Val Ser Ala Leu Cys Ala 
100 105 110 



He Cys Gly Asp Arg Ala Thr Gly Lys His Tyr Gly Ala Ser 



Ser Cys 
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115 120 125 

Asp Gly Cys Lys Gly Phe Phe Arg Arg Ser Val Arg Lys Asn His Met 
130 135 140 

Tyr Ser Cys Arg Phe Ser Arg Gin Cys Val Val Asp Lys Asp Lys Arg 
145 150 155 160 

Asn Gin Cys Arg Tyr Cys Arg Leu Lys Lys Cys Phe Arg Ala Gly Met 
165 170 175 

Lys Lys Glu Ala Val Gin Asn Glu Arg Asp Arg lie Ser Thr Arg Arg 
180 185 190 

Ser Ser Tyr Glu Asp Ser Ser Leu Phe Ser lie Asn Ala Leu Leu Gin 
195 200 205 

Ala Glu Val Leu Ser Arg Gin lie Thr Ser Pro Val Ser Gly lie Asn 
210 215 220 

Gly Asp lie Arg Ala Lys Lys lie Ala Ser lie Ala Asp Val Cys Glu 
225 230 235 240 

Ser Met Lys Glu Gin Leu Leu Val Leu Val Glu Trp Ala Lys Tyr lie 
245 250 255 

Pro Ala Phe Cys Glu Leu Pro Leu Asp Asp Gin Val Ala Leu Leu Arg 
260 265 270 

Ala His Ala Gly Glu His Leu Leu Leu Gly Ala Thr Lys Arg Ser Met 
275 280 285 

Val Phe Lys Asp Val Leu Leu Leu Gly Asn Asp Tyr lie Val Pro Arg 
290 295 300 

His Cys Pro Glu Leu Ala Glu Met Ser Arg Val Ser lie Arg lie Leu 
305 310 315 320 

Asp Glu Leu Val Leu Pro Phe Gin Glu Leu Gin lie Asp Asp Asn Glu 
325 330 335 

Tyr Ala Tyr Leu Lys Ala lie lie Phe Phe Asp Pro Asp Ala Lys Gly 
340 345 350 

Leu Ser Asp Pro Gly Lys lie Lys Arg Leu Arg Ser Gin Val Gin Val 
355 360 365 

Ser Leu Glu Asp Tyr lie Asn Asp Arg Gin Tyr Asp Ser Arg Gly Arg 
370 375 380 

Phe Gly Glu Leu Leu Leu Leu Leu Pro Thr Leu Glu Ser He Thr Trp 
385 390 395 400 

Gin Met He Glu Gin He Gin Phe He Lys Leu Phe Gly Met Ala Lys 
405 410 415 



He Asp Asn Leu Leu Gin Glu Met Leu Leu Gly Gly Gly Pro Cys Gin 
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420 425 430 

Ala Gin Glu Gly Arg Gly Trp Ser Gly Asp Ser Pro Gly Asp Arg Pro 
435 440 445 

His Thr Val Ser Ser Pro Leu Ser Ser Leu Ala Ser Pro Leu Cys Arg 
450 455 460 

Phe Gly Gin Val Ala Gly Ser Pro Ser Asp Ala Pro His Ala His His 
465 470 475 480 

Pro Leu His Pro His Leu Met Gin Glu His Met Gly Thr Asn Val lie 
485 490 495 

Val Ala Asn Thr Met Pro Thr His Leu Ser Asn Gly Gin Met Cys Glu 
500 505 510 

Trp Pro Arg Pro Arg Gly Gin Ala Ala Thr Pro Glu Thr Pro Gin Pro 
515 520 525 

Ser Pro Pro Gly Gly Ser Gly Ser Glu Pro Tyr Lys Leu Leu Pro Gly 
530 535 540 

Ala Val Ala Thr lie Val Lys Pro Leu Ser Ala He Pro Gin Pro Thr 
545 550 555 560 

He Thr Lys Gin Glu Val lie 
565 

(2) INFORMATION FOR SEQ ID NO: 132: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 470 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 132: 

AAGTAAGCCT TGTTTTTCCA CACTCATTCT CCCAGGTTTT CTTTGGATAG GCTTACTTTT 60 

CCATGCTGGA GGAGGGGCTA TCC CTTCATT TTGCCTCTCC CGCTTCCCTC CCTCTCCCCC 120 

TCCCCCTGCT TTCTCTCCCT CTGCACTTTG TGAACTG CTG CTGCAGTGCT GAAGTCCAAA 180 

GTTCAGTAAC TTGCTAAGCA CACAGATAAA TATGAACCTT GGAGAATTTA CCAATGTAAA 240 

CAGATAGCCA AGGGTCCCTT TATCAGCACT GGCTCAGGAC AGTCGTGGGG GGTCTGAAGT 300 

GGCTCAATTT TGTATTTTGT TTTTTTTGGG GGGGTGTAAA GGCGGGAGGC TGCGCTGTGC 360 

CCGCTGCTGA CAGTCGGGCG TGTTACCTCG GGAACATGGT GTAGGGAAGC TGGAAGCAGG 420 

ATAACGTGGA ACTCAACCCA AGAAACGCCA GCCTGAAGAC CATGGTCTCG 4 70 
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(2) INFORMATION FOR SEQ ID NO: 133: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 467 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

( D ) TOPOLOGY : 1 inear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 133: 

TCACAGCTAT TAGCTCATCG CTGCCAAATT GCCCCTTTAC CTAGGCTTGT GTCACTTTCA 60 

CCTTCTCATT CTCTTACTTT TACATTCTTC CTTGATATTT TGCTTTTTCA ACTTTTGGAA 120 

ATTTCTTTCT CTCTTCTACC CCTCCTCATA TTCCTCTGCA CTCCCCCCTC TCTAACTCAT 180 

GCACTTTGTG GGGTCCAAAG TTCAGTAACT TGCAAAGCAC AGGGATAAAG ATGAACCTTG 24 0 

GAAGATTTAC TCTGCTCTGA TGTAAACAGA GAGTGACAAG GGTCCCTTAT CTATGTCTCA 300 

GAGAAGCCTG TCCGGGGGGT GACCACTTGC TGGTTGTGGC TGCACAGTGT GTTTTTTTGG 360 

GGGGGAGGAG GAAACAGAAG GTGGGTAGAG CATGGACTCC CGCCCGCTGA TC CGTGTT AC 42 0 

AGCCGCAGAT GGTGAGGCAG TAGAAGGCAA CAGACAGGAT GGCGTCT 467 

(2) INFORMATION FOR SEQ ID NO: 134: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 479 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 134: 

TTTCGGGGGT GGG AC CCAAC GCTGCTCTCC TGATGGCCTC CCTGGCTCCC AGCACCTTCC 60 

ATCCCAGCTG CTCAGGGCCC CTCACCTGCG CCTCCCCCAC CCTCCCCTCT GCCCACTCCC 120 

ATCGCAGGCC ATAGCTCCCT GTCCCTCTCC GCTGCCATGA GGCCTGCACT TTGCAGGGCT 180 

GAAGTCCAAA GTTCAGTCCC TTCGCTAAGC ACACGGATAA ATATGAACCT TGGAGAATTT 24 0 

CCCCAGCTCC AATGTAAACA GAACAGGCAG GGGCCCTGAT TCACGGGCCG CTGGGG CCAG 300 

GGTTGGGGGT TGGGGGTGCC CACAGGGCTT GGCTAGTGGG GTTTTGGGGG GGCAGTGGGT 360 

GCAAGGAGTT TGGTTTGTGT CTGCCGGCCG GCAGGCAAAC GCAACCCACG CGGTGGGGGA 42 0 

GGCGGCTAGC GTGGTGGACC CGGGCCGCGT GGCCCTGTGG CAGCCGAGCC ATGGTTTCT 479 



(2) INFORMATION FOR SEQ ID NO: 135: 
(i) SEQUENCE CHARACTERISTICS: 
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(A) LENGTH: 605 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 135: 

TGGGGCCTGG GATTTAGGTT TCTAAATCGT GGGCCATGGG GCAGCCTTAT CTCTGCAAAA 60 

GCATTGAGGG TAGAAGTCAA TGATTTGGGA AGTTATTGAA TTAGGGGATC TCGGAGGTAG 120 

GCTGTCAGTG CCTGATAGTA TCAGTTAGAA TGCCTGACTT GGGGTGACAA TGGCTTGGAG 180 

GGGTGGGTGA GTCAAGGGTC AAATGAGTGC CCGTGAGTCA TGATGCCTGC CTTGTACAAT 240 

TGATAACTGA ACATCGGTGA GTTAGGGCCC CAGCAGTTGT AATTAGCACC CCGGGTGTCA 300 

GCCAGAAACC AACAAACAGC CAAATCCCTG CAGCCCCGCC CAGCCTATCC ACCGGCGGGG 360 

GACCGATTAA CCATTAACCC CCACCCCTCC CCGGCAGAGC CTCCACCCCT TCACAGAGGC 420 

TAGGCCAAGA CTCCCAGCAG ATCTTCCCAG AGGACGGTTT GAAAGGAAGG CAGAGAGGGC 480 

ACTGGGAGGA GGCAGTGGGA GGGCGGAGGG CGGGGGCCTT CGGGGTGGGC GCCCAGGGTA 54 0 

GGGCAGGTGG CCGCGGCGTG GAGGCAGGGA GAATGCGACT CTCCAAAACC CTCGTCGACG 600 



ACATG 



605 



(2) INFORMATION FOR SEQ ID NO: 136: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 478 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 136: 

TCCTGGAGAG TGGGACCCAG CGCCGCACCC AGAGGCCTCC TGGCTCCTGC TGCCTCTAGC 60 

CCTGCGCCCC TGGCCCCTCT CCACCTCCCC CACCCTCCCT TCTGCTCACT CCCAATTGCA 120 

GGCCATGACT CCGGTCCGCG TCCCTCTCAC CCCCATGAGG CCTGCACTTG CAAGGCTGAA 180 

GTCCAAAGTT CAGTCCCTTC GCTAAGCGCA CGGATAAATA TGAACCTTGG AGAATTTCCC 240 

CAGCTCCAAT GTAAACAGAG CAGGCAGGGG CCCTGATTCA CTGGCCGCTG GGGCCAGGGT 3 00 

TGGGGGCTGG GGGTGCCCAC AGAGCTTGAC TAGTGGGATT TGGGGGGGCA GTGGGTGCAG 360 

CGAGCCCGGT CCGTTGACTG CCAGCCTGCC GGCAGGTAGA CACCGGCCGT GGGTGGGGGA 420 

GGCGGCTAGC TCAGTGGCCT TGGGCCGCGT GGCTGGTGGC AGCGGAGCCA TGGTTTCT 478 
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(2) INFORMATION FOR SEQ ID NO: 137: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 622 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 137: 

TGGGCTTGGG TGTTAGGTTT CCAGTTCAAG CGACCCAGGA CAGCTTTATC TCAAATTGAG 60 

GATAGAAGTC AATGATCTGG GACGTGATTG GCTTAGGGCT TCATAGTGGT AGGCTTGCCA 12 0 

GTGTCTAAAC ATGTC AG CTG GGTTGTCCAC CTTGGTGAGA CTTGGGGGCT GCTGAGGCAA 180 

GGGGTCCAAC CAATGCCAGT CCTGTTGGGT GCCTGCCTTG GAAGATTGGT AAGTGACTAT 240 

TAATGAGCGG GAGGTGGGGG GGGGGCAACA GTTGTAATTA GCACCCCAGG TGTCAGTCAG 300 

AAACCAACAA AC AG CC AAAT CCTCGTGGCT CCACCCAGCC TACCCAGCAA CGGGGGTGAT 360 

TAACCATTAA CTCCTACCCC TCCCCACAGA GCCTCCACCC TCTGCAGAGG CTAGGCCAGG 420 

ACGCCAGGCT GAGTCTCCCA GAGGACAGTT TGAAAGAGAG GAAGGCAGAG AAGGGACCTG 4 80 

GGAGGAGGCA GGAGGAGGGC GGGGACGGGG GGGGCTGGGG CTCAGCCCAG GGGCTTGGGT 540 

GGCATCCTGG GCCGGGCAGG ACAGGGGGCT AAGGCGTGGG TAGGGGAGAA TGCGACTCTC 6 00 

TAAAACCCTT GCCGGCGATA TG 622 

(2) INFORMATION FOR SEQ ID NO : 138: 

(i)~ SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 470 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 138: 

TCTTGGGCAG TGGGACCAGC GCTGCTCCCA GAGGCCTCCT GGCTCCTGGT GCCTCTCTCC 60 

CTGCGCCCCT GGTTCCCGCT CCACCTCCCC CACCCGCCCT TCTGCTCACT CCCAATTGCA 120 

AGCCATGGCT CCCGGTCCGG TCCCTCTCGC TGCTGTGAGG CCTGCACTTG CAAGGCTGAA 180 

GTCCAAAGTT CAGTCCCTTC GCTAAGCACA CGGATAAATA TGAACCTTGG AGAATTTCCC 240 

CAGCTCCAAT GTAAACAGAG CAGCAGGGGG CCCTGATTCA CTAGCCGCTG GGGCCAGGGT 3 00 

TGGGGGTTGG GGGTGCCCAC AGGGCTTGAC TAGTGGGATT TGGGGGAGCA GTGGGTGCAG 360 

CGAGCCTGGT C CGTTG ACTG CCAGCAGTAG ACACCGGCCG TGTGTGGGGG AGGCGGCTAG 420 
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CTCAGTGGCC TTGGGCCGCG TGGCCTGGCG GTAGAGGAGC CATGGTTTCT 470 

(2) INFORMATION FOR SEQ ID NO: 139: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 557 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS : 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 13 9: 

Met Val Ser Lys Leu Thr Ser Leu Gin Gin Glu Leu Leu Ser Ala Leu 
1 5 10 is 

Leu Ser Ser Gly Val Thr Lys Glu Val Leu Val Gin Ala Leu Glu Glu 
20 25 30 

Leu Leu Pro Ser Pro Asn Phe Gly Val Lys Leu Glu Thr Leu Pro Leu 
35 40 45 

Ser Pro Gly Ser Gly Ala Glu Pro Asp Thr Lys Pro Val Phe His Thr 
50 55 60 

Leu Thr Asn Gly His Ala Lys Gly Arg Leu Ser Gly Asp Glu Gly Ser 
65 70 75 80 

Glu Asp Gly Asp Asp Tyr Asp Thr Pro Pro lie Leu Lys Glu Leu Gin 
65 90 95 

Ala Leu Asn Thr Glu Glu Ala Ala Glu Gin Arg Ala Glu Val Asp Arg 
100 105 110 

Met Leu Ser Glu Asp Pro Trp Arg Ala Ala Lys Met lie Lys Gly Tyr 
115 120 125 

Met Gin Gin His Asn lie Pro Gin Arg Glu Val Val Asp Val Thr Gly 
130 135 140 

Leu Asn Gin Ser His Leu Ser Gin His Leu Asn Lys Gly Thr Pro Met 
145 150 155 160 

Lys Thr Gin Lys Arg Ala Ala Leu Tyr Thr Trp Tyr Val Arg Lys Gin 
165 170 175 

Arg Glu He Leu Arg Gin Phe Asn Gin Thr Val Gin Ser Ser Gly Asn 
180 185 190 

Met Thr Asp Lys Ser Ser Gin Asp Gin Leu Leu Phe Leu Phe Pro Glu 
195 200 205 

Phe Ser Gin Gin Ser His Gly Pro Gly Gin Ser Asp Asp Ala Cys Ser 
210 215 220 

Glu Pro Thr Asn Lys Lys Met Arg Arg Asn Arg Phe Lys Trp Gly Pro 
225 230 235 240 



WO 



249 



PCT/US97/ll<&037 



Ala Ser Gin Gin lie Leu Tyr Gin Ala Tyr Asp Arg Gin Lys Asn Pro 
245 250 255 

Ser Lys Glu Glu Arg Glu Ala Leu Val Glu Glu Cys Asn Arg Ala Glu 
260 265 270 

Cys Leu Gin Arg Gly Val Ser Pro Ser Lys Ala His Gly Leu Gly Ser 
275 280 285 

Asn Leu Val Thr Glu Val Arg Val Tyr Asn Trp Phe Ala Asn Arg Arg 
290 295 300 

Lys Glu Glu Ala Phe Arg Gin Lye Leu Ala Met Asp Ala Tyr Ser Ser 
305 310 315 320 

Asn Gin Thr His Ser Leu Asn Pro Leu Leu Ser His Gly Ser Pro His 
325 330 335 

His Gin Pro Ser Ser Ser Pro Pro Asn Lys Leu Ser Gly Val Arg Tyr 
340 345 ^ 350 

Ser Gin Gin Gly Asn Asn Glu lie Thr Ser Ser Ser Thr lie Ser His 
355 360 365 

His Gly Asn Ser Ala Met Val Thr Ser Gin Ser Val Leu Gin Gin Val 
370 375 380 

Ser Pro Ala Ser Leu Asp Pro Gly His Asn Leu Leu Ser Pro Asp Gly 
385 390 395 400 

Lys Met lie Ser Val Ser Gly Gly Gly Leu Pro Pro Val Ser Thr Leu 
405 410 415 

Thr Asn lie His Ser Leu Ser His His Asn Pro Gin Gin Ser Gin Asn 
420 425 430 

Leu He Met Thr Pro Leu Ser Gly Val Met Ala He Ala Gin Ser Leu 
435 440 445 

Asn Thr Ser Gin Ala Gin Ser Val Pro Val He Asn Ser Val Ala Gly 
450 455 . 460 

Ser Leu Ala Ala Leu Gin Pro Val Gin Phe Ser Gin Gin Leu His Ser 
465 470 475 480 

Pro His Gin Gin Pro Leu Met Gin Gin Ser Pro Gly Ser His Met Ala 
485 490 495 

Gin Gin Pro Phe Met Ala Ala Val Thr Gin Leu Gin Asn Ser His Met 
500 505 510 

Tyr Ala His Lys Gin Glu Pro Pro Gin Tyr Ser His Thr Ser Arg Phe 
515 520 525 



Pro Ser Ala Met Val Val Thr Asp Thr Ser Ser He Ser Thr Leu Thr 
530 535 540 
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Asn Met Ser Ser Ser Lys Gin Cys Pro Leu Gin Ala Trp 
545 550 555 



(2) INFORMATION FOR SEQ ID NO : 140: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 516 amino acids 

(B) TYPE: amino acid 

( C ) STRANDEDNESS : 

(D) TOPOLOGY: linear 

(Xi) SEQUENCE DESCRIPTION: SEQ ID NO: 140: 

Met Asp Met Ala Asp Tyr Ser Ala Ala Leu Asp Pro Ala Tyr Thr Thr 
1 5 10 15 

Leu Glu Phe Glu Asn Val Gin Val Leu Thr Met Gly Asn Gly Pro Ser 
20 25 30 

Ser Pro His Cys Leu Thr Val Ala Leu Leu Gly Ala Trp His Ser Asp 
35 40 45 

Met Met lie Leu Leu Pro Leu Arg Leu Ala Arg Leu Arg His Pro Leu 
50 55 60 

Arg His His Trp Ser He Ser Gly Gly Val Asp Ser Ser Pro Gin Gly 
65 70 75 80 

Asp Thr Ser Pro Ser Glu Gly Thr Asn Leu Asn Ala Pro Asn Ser Leu 
85 90 95 

Gly Val Ser Ala Leu Cys Ala He Cys Gly Asp Arg Ala Thr Gly Lys 
100 105 no 

His Tyr Gly Ala Ser Ser Cys Asp Gly Cys Lys Gly Phe Phe Arg Arg 
115 120 125 

Ser Val Arg Lys Asn His Met Tyr Ser Cys Arg Phe Ser Arg Gin Cys 
130 135 14 0 

Val Val Asp Lys Asp Lys Arg Asn Gin Cys Arg Tyr Cys Arg Leu Lys 
145 150 155 160 

Lys Cys Phe Arg Ala Gly Met Lys Lys Glu Ala Val Gin Asn Glu Arg 
165 170 175 

Asp Arg He Ser Thr Arg Arg Ser Ser Tyr Glu Asp Ser Ser Leu Phe 
180 185 190 

Ser He Asn Ala Leu Leu Gin Ala Glu Val Leu Ser Arg Gin He Thr 
195 200 205 

Ser Pro Val Ser Gly He Asn Gly Asp He Arg Ala Lys Lys He Ala 
210 215 220 
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Ser lie Ala Asp Val Cys Glu Ser Met Lys Glu Gin Leu Leu Val Leu 

225 230 235 240 

Val Glu Trp Ala Lys Tyr lie Pro Ala Phe Cys Glu Leu Pro Leu Asp 
245 250 255 

Asp Gin Val Ala Leu Leu Arg Ala His Ala Gly Glu His Leu Leu Leu 
260 265 270 

Gly Ala Thr Lys Arg Ser Met Val Phe Lys Asp Val Leu Leu Leu Gly 
275 280 285 

Asn Asp Tyr lie Val Pro Arg His Cys Pro Glu Leu Ala Glu Met Ser 
290 295 300 

Arg Val Ser lie Arg lie Leu Asp Glu Leu Val Leu Pro Phe Gin Glu 
305 310 315 320 

Leu Gin lie Asp Asp Asn Glu Tyr Ala Tyr Leu Lys Ala lie lie Phe 
325 330 335 

Phe Asp Pro Asp Ala Lys Gly Leu Ser Asp Pro Gly Lys lie Lys Arg 
340 345 350 

Leu Arg Ser Gin Val Gin Val Ser Leu Glu Asp Tyr lie Asn Asp Arg 
355 360 365 

Gin Tyr Asp Ser Arg Gly Arg Phe Gly Glu Leu Leu Leu Leu Leu Pro 
370 375 380 

Thr Leu Glu Ser lie Thr Trp Gin Met lie Glu Gin lie Gin Phe He 
385 390 395 400 

Lys Leu Phe Gly Met Ala Lys He Asp Asn Leu Leu Gin Glu Met Leu 
405 410 415 

Leu Gly Gly Ser Pro Ser Asp Ala Pro His Ala His His Pro Leu His 
420 425 430 

Pro His Leu Met Gin Glu His Met Gly Thr Asn Val He Val Ala Asn 
435 440 445 

Thr Met Pro Thr His Leu Ser Asn Gly Gin Met Cys Glu Trp Pro Arg 
450 455 460 

Pro Arg Gly Gin Ala Ala Thr Pro Glu Thr Pro Gin Pro Ser Pro Pro 
465 470 475 480 

Gly Gly Ser Gly Ser Glu Pro Tyr Lys Leu Leu Pro Gly Ala Val Ala 
485 490 495 

Thr He Val Lys Pro Leu Ser Ala He Pro Gin Pro Thr He Thr Lys 
500 505 510 



Gin Glu Val He 
515 
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(2) INFORMATION FOR SEQ ID NO ; 141: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 17 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 141: 
GCGGGACCGG ATCAGCA 



(2) INFORMATION FOR SEQ ID NO: 142: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 5 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: 

(D) TOPOLOGY: linear 

(Xi) SEQUENCE DESCRIPTION: SEQ ID NO: 142: 

Arg Asp Arg lie Ser 
1 5 



(2) INFORMATION FOR SEQ ID NO: 143: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 17 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 143: 
GCGGGACTGG ATCAGCA 



(2) INFORMATION FOR SEQ ID NO: 144: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 7 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: 

(D) TOPOLOGY: linear 

<xi) SEQUENCE DESCRIPTION: SEQ ID NO: 144: 

Ala Glu Val Leu Ser Arg Gin 
1 5 



(2) INFORMATION FOR SEQ ID NO: 145: 
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(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 3 0 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 
(D> TOPOLOGY: linear 

(ix) FEATURE: 

(A) NAME /KEY : modif ied_base 

(B) LOCATION: 16 

(D) OTHER INFORMATION: /note- M N o C or T M 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 145: 

GCGGAGGTCC TGTCCNGACA GGTACCGGGG 30 



(2) INFORMATION FOR SEQ ID NO: 146: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 15 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

( D ) TOPOLOGY : 1 inea r 

(ix) FEATURE: 

<A) NAME /KEY : modif ied_base 
(B) LOCATIONS 

(D) OTHER INFORMATION: /note= "N = C or T" 
<xi) SEQUENCE DESCRIPTION: SEQ ID NO: 146: 
AAAGCAANGA GAGAT 15 



(2) INFORMATION FOR SEQ ID NO: 14 7: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 4 amino acids 

(B) TYPE: amino acid 
{ C) STRANDEDNESS : 

(D) TOPOLOGY: linear 

(ix) FEATURE: 

(A) NAME/KEY: Modif ied-site 

(B) LOCATIONS 

(D) OTHER INFORMATION: /note= "X = R or any amino acid" 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 147: 



Lys Gin Xaa Glu 
1 
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CLAIMS 

1 . A method for screening for diabetes comprising: 

a) obtaining sample nucleic acid from an animal; and 

b) analyzing the nucleic acids to detect a mutation in an HNF-encoding nucleic segment; 
wherein a mutation in the HNF-encoding nucleic acid is indicative of a propensity for non-insulin 
dependent diabetes. 

2. The method of claim 1 , wherein the HNF-encoding nucleic acid is an HNF1 a encoding nucleic 
acid. 

3. The method of claim 2, wherein the HNF1 a encoding nucleic acid is located on human 
chromosome 12q. 

4. The method of claim 2, wherein the HNF la-encoding nucleic acid is located at the M0DY3 locus. 

5. The method of claim 1 , wherein the HNF-encoding nucleic acid is an HNF4a-encoding nucleic 
acid. 

6. The method of claim 5, wherein the HNF4a-encoding nucleic acid is located on human 
chromosome 20. 

7. The method of claim 5, wherein the HNF4a encoding nucleic acid is located at the M0DY1 locus. 

8. The method of claim 1 , wherein the HNF-encoding nucleic acid is an HNF1 p-encoding nucleic 
acid. 

9. The method of claim 8, wherein the HNF4a-encoding nucleic acid is located at the M0DY4 locus. 

1 0. The method of claim 1, wherein the nucleic acid is DNA. 

11. The method of claim 1, wherein the step of analyzing the HNF-encoding nucleic acid comprises 
sequencing of the HNF-encoding nucleic acid to obtain a sequence. 

1 2. The method of claim 1 1 , wherein the sequence of the HNF encoding nucleic acid is compared to a 
native nucleic acid sequence of an HNF gene. 

1 3. The method of claim 1 2, wherein the sequence of the HNF encoding nucleic acid is compared to a 
native nucleic acid sequence of HNF la. 

1 4. The method of claim 1 3, wherein the native nucleic acid sequence of HNF 1 a has a sequence set 
forth in SEQ ID NO: 2. 
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15. The method of claim 12, wherein the sequence of the HNF encoding nucleic acid is compared to a 
native nucleic acid sequence of HNF4a. 

1 6. The method of claim 1 5 r wherein the native nucleic acid sequence of HNF4a has a sequence set 
forth in SEQ ID N0:78. 

1 7. The method of claim 1 2, wherein the sequence of the HNF encoding nucleic acid is compared to a 
native nucleic acid sequence of HNF1 p. 

18. The method of claim 1 7, wherein the native nucleic acid sequence of HNF1 p has a sequence set 
forth in SEQ ID N0:90. 

1 9. The method of claim 1 , wherein the HNF-encoding nucleic acid comprises at least one point 
mutation. 

20. The method of claim 1, wherein the HNF-encoding nucleic acid has a translocation mutation. 

21. The method of claim 1, wherein the HNF-encoding nucleic acid has a deletion mutation. 

22. The method of claim 1, wherein the HNF-encoding nucleic acid has a insertion mutation. 

23. The method of claim 1, wherein the HNF-encoding nucleic acid is an HNF la-encoding nucleic acid 
and a mutation occurs in exon 2, exon 4, exon 6, or exon 9 of the HNF1a-encoding nucleic acid. 

24. The method of claim 1, wherein a mutation occurs at codon 1 31, 142, 159, 1 71, 289, 291, 292, 
273, 379, 401, 447, 547, or 548 of an HNF1a-encoding nucleic acid having the sequence of SEQ ID 
N0:1. 

25. The method of claim 1, wherein the HNF-encoding nucleic acid is an HNF la-encoding nucleic acid 
and a mutation occurs at the splice acceptor of intron 5 or intron 9. 

26. The method of claim 1, wherein the HNF-encoding nucleic acid is an HNF la-encoding nucleic acid 
and a mutation is a mutation defined in Table 8. 

27. The method of claim 1, wherein the HNF-encoding nucleic acid is an HNF4a-encoding nucleic acid 
and a mutation occurs in exon 7 of the HNF4a-encoding nucleic acid. 

28. The method of claim 1, wherein a mutation occurs at codon 268, 130 or 273 of an HNF4a- 
encoding nucleic acid having the sequence of SEQ ID N0:78. 

29. The method of claim 1, wherein the HNF-encoding nucleic acid is an HNF4a-encoding nucleic acid 
and a mutation is a mutation defined in Table 10. 
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30. The method of claim 1 , wherein the HNF-encoding nucleic acid is an HNF1 (^encoding nucleic acid 
and a mutation occurs in exon 2, exon 7 or intron 8 of the HNFip-encoding nucleic acid. 

31. The method of claim 1, wherein a mutation occurs at codon 1 77, 463, at nucleotides 48 of intron 
8, or at nucleotide 22 of intron 8 of an HNFip-encoding nucleic acid having the sequence of SEQ ID 
N0:90. 

32. The method of claim 1 , wherein the HNF-encoding nucleic acid is an HNF1 ^ encoding nucleic acid 
and a mutation is a mutation defined in Table 15. 

33. The method of claim 1, wherein the step of analyzing the HNF-encoding nucleic acid comprises 
PCR. 

34. The method of claim 1, wherein the step of analyzing the HNF-encoding nucleic acid comprises 
use of an RNase protection assay. 

35. The method of claim 1, wherein the step of analyzing the HNF-encoding nucleic acid comprises an 
RFLP procedure. 

36. A method of regulating diabetes in an animal comprising the step of modulating HNF function in 
the animal. 

37. The method of claim 36, further comprising the step of diagnosing an animal with diabetes via 
analysis of an HNF la-encoding nucleic acid sequence for a mutation. 

38. The method of claim 36, wherein the step of modulating HNF function comprises providing an 
HNF 1 a polypeptide to the animal. 

39. The method of claim 38, wherein the HNF la polypeptide is a native HNF1a polypeptide. 

40. The method of claim 39, wherein the native HNF 1a polypeptide has the sequence of SEQ ID NO: 

2. 

41 . The method of claim 38, wherein the provision of an HNF 1a polypeptide is accomplished by 
inducing expression of an HNF1 a polypeptide. 

42. The method of claim 41 , wherein the expression of an HNFIa polypeptide encoded in the 
animal's genome is induced. 

43. The method of claim 41, wherein the expression of an HNFIa polypeptide encoded by a nucleic 
acid provided to the animal is induced. 
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44. The method of claim 38, wherein the provision of an HNFIoc polypeptide is accomplished by a 
method comprising introduction of an HNF la-encoding nucleic acid to the animal. 

45. The method of claim 38, wherein the provision of an HNF1a polypeptide is accomplished by 
injecting the HNF 1a polypeptide into the animal. 

46. The method of claim 36, wherein the step of modulating HNF function in the animal comprises 
providing a modulator of HNF1 a function to the animal. 

47. The method of claim 46, wherein the modulator of HNF1 a function is an agonist of HNF1 a. 

48. The method of claim 46, wherein the modulator of HNF1 a function modulates transcription of an 
HNF 1a encoding nucleic acid. 

49. The method of claim 46, wherein the modulator of HNF1 a function modulates translation of an 
HNF la-encoding nucleic acid. 

50. The method of claim 36. further comprising the step of diagnosing an animal with diabetes via 
analysis of an HNF4a-encoding nucleic acid sequence for a mutation. 

51 . The method of claim 36, wherein the step of modulating HNF function comprises providing an 
HNF4a polypeptide to the animal. 

52. The method of claim 51, wherein the HNF4a polypeptide is a native HNF4a polypeptide. 

53. The method of claim 51. wherein the native HNF4a polypeptide has the sequence of SEQ ID 
N0:79. 

54. The method of claim 51, wherein the provision of an HNF4a polypeptide is accomplished by 
inducing expression of an HNF4a polypeptide. 

55. The method of claim 54, wherein the expression of an HNF4ct polypeptide encoded in the 
animal's genome is induced. 

56. The method of claim 54, wherein the expression of an HNF4a polypeptide encoded by a nucleic 
acid provided to the animal is induced. 

57. The method of claim 51, wherein the provision of an HNF4a polypeptide is accomplished by a 
method comprising introduction of an HNF4a encoding nucleic acid to the animal. 

58. The method of claim 51, wherein the provision of an HNF4a polypeptide is accomplished by 
injecting the HNF4a polypeptide into the animal. 
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59. The method of claim 36, wherein the step of modulating HNF function in the animal comprises 
providing a modulator of HNF4ot function to the animal. 

60. The method of claim 59, wherein the modulator of HNF4cc function is an agonist of HNF4ct. 

61 . The method of claim 59, wherein the modulator of HNF4a function modulates transcription of an 
HNF4a-encoding nucleic acid. 

62. The method of claim 59, wherein the modulator of HNF4ct function modulates translation of an 
HNF4aencoding nucleic acid. 

63. The method of claim 36, further comprising the step of diagnosing an animal with diabetes via 
analysis of an HNFip-encoding nucleic acid sequence for a mutation. 

64. The method of claim 36, wherein the step of modulating HNF function comprises providing an 
HNF1 p polypeptide to the animal. 

65. The method of claim 64, wherein the HNFip polypeptide is a native HNF1 p polypeptide. 

66. The method of claim 65, wherein the native HNF 1 p polypeptide has the sequence of SEQ ID 
N0:91. 

67. The method of claim 64, wherein the provision of an HNF1 p polypeptide is accomplished by 
inducing expression of an HNFip polypeptide. 

68. The method of claim 67, wherein the expression of an HNF 1 p polypeptide encoded in the animal's 
genome is induced. 

69. The method of claim 67, wherein the expression of an HNF1 p polypeptide encoded by a nucleic 
acid provided to the animal is induced. 

70. The method of claim 65. wherein the provision of an HNF1 p polypeptide is accomplished by a 
method comprising introduction of an HNFip-encoding nucleic acid to the animal. 

71. The method of claim 65, wherein the provision of an HNF1 p polypeptide is accomplished by 
injecting the HNF1 p polypeptide into the animal. 

72. The method of claim 36, wherein the step of modulating HNF function in the animal comprises 
providing a modulator of HNF1 p function to the animal. 

73. The method of claim 72, wherein the modulator of HNF1 p function is an agonist of HNFip. 

74. The method of claim 72, wherein the modulator of HNF1 p function modulates transcription of an 
HNFip-encoding nucleic acid. 



WO 98/11254 PCT/US97/16037 

259 

75. The method of claim 72, wherein the modulator of HNF1 p function modulates translation of an 
HNFip-encodtng nucleic acid. 

76. A method of screening for modulators of HNF function comprising the steps of: 

a) obtaining an HNF polypeptide; 

b) determining a standard activity profile of the HNF polypeptide; 

c) contacting the HNF polypeptide with a putative modulator; and 

d) assaying for a change in the standard activity profile. 

77. The method of claim 76, wherein the HNF polypeptide is an HNF la polypeptide. 

78. The method of claim 77, wherein the standard activity profile of the HNF1a polypeptide is 
determined by measuring the binding of the HNF1 a polypeptide to a nucleic acid segment comprising the 
sequence of SEQ ID NO: 9. 

79. The method of claim 78, wherein the nucleic acid segment comprising the sequence of SEQ ID 
NO: 2 comprises a detectable label. 

80. The method of claim 77, wherein the HNF 1a polypeptide comprises a detectable label. 

81. The method of claim 77, wherein the standard activity profile of the HNF 1a polypeptide is 
determined by determining the ability of the HNF 1a polypeptide to stimulate transcription of a reporter 
gene, the reporter gene operatively positioned under control of a nucleic acid segment comprising the 
sequence of SEQ ID NO: 1. 

82. The method of claim 76, wherein the HNF polypeptide is an HNF4a polypeptide. 

83. The method of claim 82, wherein the standard activity profile of the HNF4a polypeptide is 
determined by measuring the binding of the HNF4a polypeptide to an amino acid segment comprising the 
sequence of SEQ ID N0:85. 

84. The method of claim 83, wherein the nucleic acid segment comprising the sequence of SEQ ID 
NO: 1 comprises a detectable label. 

85. The method of claim 82, wherein the HNF4a polypeptide comprises a detectable label. 

86. The method of claim 82, wherein the standard activity profile of the HNF4a polypeptide is 
determined by determining the ability of the HNF4a polypeptide to stimulate transcription of a reporter 
gene, the reporter gene operatively positioned under control of a nucleic acid segment comprising the 
sequence of SEQ ID N0:78. 
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87. The method of claim 76, wherein the HNF polypeptide is an HNF1 p polypeptide. 

88. The method of claim 89, wherein the HNF1 p polypeptide comprises a detectable label. 

89. The method of claim 88, wherein the standard activity profile of the HNF 1 p polypeptide is 
determined by determining the ability of the HNF1p polypeptide to stimulate transcription of a reporter 
gene, the reporter gene operatively positioned under control of a nucleic acid segment comprising the 
sequence of SEQ ID NO: 1 28. 

90. A method of screening for modulators of HNF function comprising the steps of: 

a) obtaining an HNF-encoding nucleic acid segment; 

b) determining a standard transcription and translation activity of the HNF nucleic acid 

sequence; 

c) contacting the HNF-encoding nucleic acid segment with a putative modulator; 

d) maintaining the nucleic acid segment and putative modulator under conditions that 

normally allow for HNF transcription and translation; and 

e) assaying for a change in the transcription and translation activity. 

91 . An HNF modulator prepared by a process comprising screening for modulators of HNF function 
comprising: 

a) obtaining an HNF polypeptide; 

b) determining a standard activity profile of the HNF polypeptide; 

c) contacting the HNF polypeptide with a putative modulator; and 

d) assaying for a change in the standard activity profile. 

92. An HNF modulator prepared by a process comprising screening for modulators of HNF function 
comprising: 

a) obtaining an HNF-encoding nucleic acid segment; 

b) determining a standard transcription and translation activity of the HNF nucleic acid 

sequence; 

c) contacting the HNF-encoding nucleic acid segment with a putative modulator; 

d) maintaining the nucleic acid segment and putative modulator under conditions that 

normally allow for HNF transcription and translation; and 

e) assaying for a change in the transcription and translation activity. 
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93. An isolated and purified polynucleotide having an HNF la-encoding nucleic acid sequence. 

94. The polynucleotide of claim 93, wherein the HNFIa encoded has an amino acid sequence as set 
forth in SEQIDNQ:127. 

95. The polynucleotide of claim 93, wherein the HNF la-encoding nucleic acid sequence has a 
sequence of SEQ ID NO: 126. 

96. An isolated and purified polynucleotide having an HNF1 (^encoding nucleic acid sequence. 

97. The polynucleotide of claim 96, wherein the HNF1 p encoded has an amino acid sequence as set 
forth in SEQ ID NO: 139. 

98. The polynucleotide of claim 96, wherein the HNF1 p-encoding nucleic acid sequence has a 
sequence of SEQ ID NO: 128. 

99. An isolated and purified nucleic acid segment comprising 1 5 contiguous nucleic acids identical to 
the sequence of SEQ ID NO: 128 or SEQ ID NO: 126. 

100. The isolated and purified nucleic acid segment of claim 99, wherein said segment encodes a full- 
length HNF polypeptide. 

101 . The isolated and purified nucleic acid segment of claim 100, wherein said segment encodes a 
promoter for the expression of an HNF polypeptide. 
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Frameshift mutation, insertion of C in codon 289. Exon 4; CCC-*CCCC 
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Missenc© mutation, codon 131. Exon 2; CGG (Arg)-+CAG (Gtn) 
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Splicing mutation - splice acceptor site of Intron 5; AG-»GG 
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Splicing mutation - splice donor site of Intron 9; GT-»AT 
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Frameshift mutation - deletion 



ion of TG in codons 647-548. Exon 9; ACT GAG^ACAG 



F. A Pedigree 









NN 



NM 
P522 



1A50 \rs< 

■ ■ 



*T0 



NM 1 N N 
^S**i ww NM 



FIG. 5F 



NM NM 

Missense mutation, codon 447. Exon 7; CCG-»CTG. Pro-*Leu 
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Frameshift mutation - CT deletion codon 379, Exon 6; CCT-^C 
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FIG. 8A. Partial Sequence of Human HIMF4 Gene 
(Exon 1 SEQ ID N0:34) 

GCAGAGAGGG CACTGGGAGG AGGCAGTGGG AGGGCGGAGG 
GCGGGGGCCT TCGGGGTGGG CGCCCAGGGT AGGGCAGGTG 
GCCGCGGCGT GGAGGCAGGG AGAATGCGAC TCTCGAAAAC 
CCTCGTCGAC ATGGACATGG CCGACTACAG TGCTGCACTG 
GACCCAGCCT ACACCACCCT GGAATTTGAG AATGTGCAGG 

TGTTGACGAT GGGCAATGGT AGGTGGGGGC AGATGTGCCC 
CA GTGGGGG CAG GTGTGCCTGG GTCCAGGAGC 
AGATCTTTGG CACTCAACTT TGGGGTGGGA GGAGAATGAT 
AC A A A ATG GT AGGTTGGTCC TACAGGCCAG CACAGGTGTT 
GCCAAGTGAA GCCCATGTGC CCAGGCACAG TGATCACAGG 

CATTCTGGGT GAAGGGAGGC CTGCAAGGGC CAATTTCCAG 
CAAAAGTCGA TCCCGGCTAT TCCTCCCAGG CCCTTCCAGT 
CCTCACTGCC TCACAGTGGC TCTGCTTGGC GCTTGGCACA 
^ G ^T ATGAT GGTGAGCTCC CCCTTGGTGC CCAGCTCCAG 
CGATTCAGCC CAGCACGGCC CCTTCGTGAA CCCCTTGGGC 

SI^S GTTCAG AGAQ ACGGCA AGGGATGTTG TATCCCTGGA 
GATGGTGGTT GGAGACATAA CCGCATTTCT C 
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FIG. 8B. Partial Sequence of Human HNF4 gene 
(Exon 1b SEQ ID NO:36) 



TGGATGTTTG TACATGTGTG CTGTGTGTGC GGGTCATAGA 
GCACATGTGT TTGTGCATGC GGACCTGTTG GAGTGGCGT6 
TTCTTCCTGC ATCTTTATCC TGTATGGGCG TTTTGTCGTG 
TGCCCATATT TGTACCTGCT GTGTATATAT GCAGTTCCCT 
GTGCTGCGGG CGGGGGTCAG CGGTCTCTGG TGTGCACGAC 

TGCACAGACC CAAATGCAGG ACTCTGTTGT TGCCACTCAC 
CAAGTGAGAT TCATATCAGC AACATGTCCG TTTGTCTCTG 
AGCAGATTTG TTGCCGCTGC GTCTCGCCAG ATTGAGGCAT 
CCCCTCCGAC ATCACTGGAG CATATCTGGA GGGGTGGACA 
GTTCTCCACA GGGAGGTAGG GGAAAAGAGG AGGCCCGGAA 

ACCCCTCCTG GAGGGAAGAG CCCCATCGGT CCCAGGCCAG 
CCTCAGAGGA GAGGGGGGAG GCAGCTGGCT GAGGTCAGCC 
TYGCCACCCTG CTTCCTTCTG TGTCTTGGAG CCACTCAGCC 
AGTATGAGGC TGCAGCTCCA GCTGAGGTCT GGAATCTTGT 
GGTCAGCTCA GCTAGGGTGA GGAGGCAGCT GCTGGGCACT 

GCTTGTTGTC AGCTCAGCAG GTGCTCACCT GCCCCTGCCG 
TCCAGTCACG TGTGACCTTG GGCATGTCAC CTCCCCTATC 
CTGGCTTCTG TATCTTCTAC AAAACAGGCT TCATTCCCCC 
AGGCCTGCTG GCTGGACGGC TTTTAGGCCT GTCTGAGGAC 
CACGCCAGGA GCGCAAGGCA AAAACACACC AGAGAT 
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FIG. 8C. Partial Sequence of Human HNF4 Gene 
(Exon 2 SEQ ID NO:38) 



CCCCTTGCGA GTTAGGAGGC CGGCTCCCAC CCCAGAAGGT 
GGCCAGGTTT TCATGCCTTC CTAGAGAAAG CTGGGGCTGC 
TGGCCTCCAC CACAGGGAGA CGCAGACCCT CAGAAACAAG 
TCTGTGAAGT CACAACCAGC CCCAGTTTAC AGATGTGAAA 
CTGAAGCTCC AAAAAGTCAG GAGGTCACTG AGTGGGGAGG 

TGATGGAGTG GAACAGCCCC CAGATCTGGC TGAGGCCGAA 
GCCCTGGAGA GATCCCCGCA AGGCTCCCTT AGATGCCTGA 
CATTCTGTTC TTCCTGAAGC CTCACTCCCT TCTCTCCTGG 
CGCAGACACG TCCCCATCAG AAGGCACCAA CCTCAACGCG 
CCCAACAGCC TGGGTGTCAG CGCCCTGTCT GCCATCTGCG 

GGGACCGGGC CACGGGCAAA CACTACGGTG CCTCGAGCTG 
TGACGGCTGC AAGGGCTTCT TCCGGAGGAG CGTGCGGAAG 
AACCACATGT ACTCCTGCAG GTGAGGAGCC TCAATTTCTT 
CAGCTGGGAA ATGGGCACAC TTGGGCTCAT GGCCCCAAGG 
TCTGTCTTCT CCCTGAGTGG GTAGGTCCCA GAGACAGCTG 

CCCTTCAGGG CCTTCAAGGC TCCTTCTGGTT TTGT 
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FIG. 8D. Partial Sequence of Human HNF4 Gene 
(Exon 3, SEQ ID NO:40) 

AGAGAGTTCA TAGCACCTTT CCAGCTCCTG GTGGGTTCAA 
GAGAGAACTC CCGGGATGAA GAGATGAGAG CACTGAGGTT 
GGGGGGTCAA CTGGATAGCC AGGGCCCTAG TTCTGTCCTA 
AGAGGAGGAA GTTGTGTCTT CTCCATCCAA CCATCCAAAAG 
ACCTCCCCAG ATTTAGCCGG CAGTGCGTGG TGGACAAAGA 

CAAGAGGAAC CAGTGCCGCT ACTGCAGGCT CAAGAAATGC 
TTCCGGGCTG GCATGAAGAA GGAAGGTGAG CCTCGGCCCT 
CCCCGCCCCA CCACCACTGC ACCACCTGCA CCCACAGCTC 
CCCGACAGTC ATTTACAACT GTAGCCACAC TTTATGACTC 
AGTGGCAGGC CCCAGGGTGA CTGGCTAATG GCTGAGAAGA 

GGGAGGGCCT GGAAATCTGA CCATAGGGAG CGGCTGGGCT 
TGGTCTTGAG AAAGATTC 
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FIG. 8E. Partial Sequence of Human HNF4 Gene 
{Exon 4 SEQ ID NO:42) 

tcccactcct catcagtcac agacaccccc accccctact 

ccatccctgt tctccctcct cacctctctg tgcctcctca 

cagCCGTCCA GAATGAGCGG GACCGGATCA GCACTGGAAG 

GTCAAGCTAT GAGGACAGCA GCCTGCCCTC CATCAATGCG 

CTCCTGCAGG CGGAGGTCCT GTCCCGACAG GTACCGGGGT 

GATCCTGCCA CCCACCCAGG GGATCCCCCA CACTACAGAG 
GAGCTCACCT CCTCCACCTC CATTCTCCCC AGCCAGGCCC 
TGGAGCAGCT GACGGGAGGG GCCTCAGATA TTACAGAAGG 
GACACTGAGT GCGGTTTCAC ATGGCCCAGT TTGCAGCAAG 
GGCAGGAATC GAACCTGGCG CCCTGGGGCA CTTTCTAATT 

CATCCTACTG CCTGCATCCC ACAGGCCAAG CAGAGTCTTC 
ACCTTCACTG AGGGCCTGCG ATCAGCTCAG CTCCGAGAGA 
ACAGAGCAGT GGCTCAGTGG AGAGAGGTGG CAAAGTGGGG 
CCCAGCCCTT CCCTTGCTGA GTGACCTTGG GCAAGTCACA 
GCACCTCTCT GAGCCATGGT TGCCTCATTG TCAGAAAAGG 

ATGATGATTT TTTGCCTGC TTCTCCTCTA AGGCTGACAG 
ACTCCTTGGG GCTCTAAAGC TG 
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FIG. 8F. Partial Sequence of Human HNF4 Gene 
(Exon 5, SEQ ID NO:44) 



TTCTCCTCA TCCCTGCCTC CTCCCTCCCT CCGTTTTTAC 
CCTGAGCTTC CTTCAGAGCT GGAGGGCACC CACTATCCAG 
CCCCCTCCCC ACATCTGATT CCAGGGAGGG GGCTCTGTGC 
AGGGGACAGA GAATGCGGGA GGGCCCGGAC ATCTCCAGCA 
TTTTCTTCCC TGTATCTCTC GAAGATCACC TCCCCCGTCT 

CCGGGATCAA CGGCGACATT CGGGCGAAGA AGATTGCCAG 
CATCGCAGAT GTGTGTGAGT CCATGAAGGA GCAGCTGCTG 
GTTCTCGTTG AGTGGGCCAA GTACATCCCA GCTTTCTGCG 
AGCTCCCCCT GGACGACCAG GTGAGGATGG GCGTGGATGG 
TGGGCAGTAG TGGGCAGTGG GCGGGGCAGC CAGGGGGCTG 

CTGGCCCACC TG G G AT AT AG CCGTGGACTG GCTTGATTTT 
ATTTTATTTA ACAAAATATG TAGTGCACAC ACGTGTCTGA 
AACTTTAAAT CACCTTACAA ATATTAACTC AGTTAGCTCC 
TCCAACAACT CTATGAGGTA GGTACTAAGG TACTATTATT 
ACTGCCATCT CATAGGTGAG AGATTGGGGC ACAGAGAGGT 

TAAGTAACCT GCTCAAGGTC ACATAGCTAC TATCCAGCAT 
AGCTGGG 
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FIG. 8G. Partial Sequence of Human HNF4 Gene 
(Exon 6, SEQ ID NO:46) 



ATTTTTACAA AGCACCCTTC ATAATTCTCC ATAGCTGGTC 
CATGGGTGGG AATTTGGGAC CCACAGTTTT GGAACTTTTT 
GGGATCATAG ACCTTTTTGA GAATCTCAAA AAAGAAAAAA 
AAGCACACAG AATGTTGCTT ACAGTTTCAT CAGGCACACA 
GAAGAGGCCC AGCACGAAGC AGTTTCTTGC CCAAGGACAC 

AGCAGTTCAA GGACAGAGTC AGCGCGAGGT CT CTCAG CTC 
TGAGCACATG TTCTTTCCCC TTCCAGGTTT CTAGTTTTAT 
GGGTAGTAGT TTTATGATGC CCATTTCACA GTTCAGGCAG 
GTAGAGGCAG AGGGGAGCAT TAAGCTGACT TGCCCAGCGT 
CACTGAGTTG GCTACGGGCA GCCTTCCCAA GGGTACAGAT 

GGCAAACACT GTTCCTTATC TCTTTCAGGT GGCCCTGCTC 
AGAGCCCATG CTGGCGAGCA CCTGCTGCTC GGAGCCACCA 
AGAGATCCAT GGTGTTCAAG GACGTGCTGC TCCTAGGTGA 
GGCGGCTGCC TGCCCTGGCC AGGGCTCCAG GGAGGGTATG 
CCTAGCATGG CACTCACCCA GGCAAGGAGA TTCACATGGT 

GGCATGCAAG GGTGAGGGAG ACTAGTCAGG AGTG GCCC TG 
TCCTCAGGCT TGCATTGGAG GGCTCCAGGA CTCAGTTTTC 
AACTGGGTAC CCCACTCAGA TGCAAGGAAA TGTGGATGCA 
AGTCACCAAA TTCCCAGCAT TGAAGTCAG A GCA CGATCAG 
GGTTATCCCT GGAATTACCT GTGCATCCTT TTTTCTTTTG 

ACAGAGTCTT GCTCTGTCAC TCAGGCTGGA GTGCAATGAT 
GTGA 
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^3/77 

FIG. 8H. Partial Sequence of Human HNF4 Gene 
(exon 7, SEQ ID NO:48) 

GCAACACTAG TATTTTAATA TAACAATGCT ATGAGGGAGC 
TCGATTATTT ATCCTCATCT TATAGATAAG AAAACTGAGG 
CACAGAGAGG TTAAGTAACT TATCCAACTA TAACCAGCTA 
TCAGGGGCAG AGCCATTTAA GCAGGGCAGT GCAGTTCCAG 
AATCTGGTCC TTTAACCTTG ATGCTTTGGT GCCTATCAGG 

TGACCTTTGA ATGTCATCGA TCTTGTGAGT CATGTTGGTA 
AATGGAGCTT GGGTCATGTG AAAGAGGTCC TAGAAAGCCA 
AGTTCCAAGC TCAGCCGGAT GACTCAAGGC AG CTTATCTT 
CTGAATCTGG GCCTCAGCTT CCTTACCTGT GAAATGGGA6 
TCACCATCCC TGCAGGTCCT CCTCCCACAG GCACCAGCTA 

TCTTGCCAAC TTAAAAGCCA AAACTAGAGG AGAGGGGTCA 
ACCCAAAGTG ACTTCCCATC CTCCCTCCCT CCCAACCCTT 
CCAGGCAATG ACTACATTGT CCCTCGGCAC TGCCCGGAGC 
TGGCGGAGAT GAGCCGGGTG TCCATACGCA TCCTTGACGA 
GCTGGTGCTG CCCTTCCAGG AGCTGCAGAT CGATGACAAT 

GAGTATGCCT ACCTCAAAGC CATCATCTTC TTTGACCCAG 
GTACAGTGCA CACCTCCTAA GCCATCCCTG ACTCTCTCTC 
CAGAACGCTC TGCCAGACTT CTCCTATTGG GTTCTGTACA 
CTGAGTTCAC AGCCTCATCT CATGTTAACG ACAGCCAGGA 
GAGGCCGTTT TCATTTAACA GATGAGGCAA GTCAAGATTT 

GAAGAGACAA TATGGCCGGG CGCAGTGGCT CACACCTGTA 
ATCCCATCAC TTTGGGAGGC TGAGGCGGGC GGATCACCTG 
AGGTCAGGGG TCAAGATGAG CCTGGCTAAC ATGGAGAAAC 
CCCATCTCTA CTTAAAA 
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FIG. 81. Partial Sequence of Human HNF4 Gene 
(Exon 8 SEQ ID NO:50) 



GTGGCTCTGC CAACAACTGG CTGTGCGACC CAGGACAAGT 
CCTATCTTTG CACTGTGTCT GGGTTTCCCC GTGTGTAAGA* 
TGAGGCGGTT GCTAGGTGCT TATTGGATGC ATTCCTCAAG 
TCCCGCCCTC CATCTCCTAT TCCCCTCTCT TCTGGTTTAG 
TGCTTTAGGA AATGTGGCAG AAATCTTTTT CTGCCTGTGT 

CTAGGAAATC ATAATTCATG CTGGCGTACC CTGGTTGTTG 
AGGTCCCTGA ATCCTTGTGC CCACACTGCT GAAGACTCCT 
TGTGTGACAC AAGTCAGGGG ACATCTGGGT CTTGACTCCC 
CAGATGCTCC AGGTGGACCC TGCTGCCCTC CCTTGCCCAC 
CCTCTTCCAT TGTAGATGCC AAGGGGCTGA GCGATCCAGG 

GAAGATCAAG CGGCTGCGTT CCCAGGTGCA GGTG AGCTTG 
GAGGACTACA TCAACGACCG CCAGTATGAC TCGCGTGGCC 
GCTTTGGAGA GCTGCTGCTG CTGCTGCCCA CCTTGCAGAG 
CATCACGTGG CAGATGATCG AGCAGATCCA GTTCATCAAG 
CTCTTCGGCA TGGCCAAGAT TGACAACCTG TTGGAGGAGA 

TGCTGCTGGG AGGTCCGTGC CAAGCCCAGG AGGGGCGGGG 
TTGGATTGGG GACTCCCCAG GAGACAGGCC TCACACAGTG 
AGCTCACCCC TCAGCTCCTT GGCTTCCCCA CTGTGCCGCT 
TTGGGCAAGT TGCTTAACCT GTCTGTGCCT CAGTTTCCTC 
ACCAGAAAAA TGGGAACAAG GCAATGGTCT ATTTGTTCAG 

GCACCGAGAA CCTAGCACGT GCCAGTCACT GTTCTAAGTG 
CTGGCAATTC AGCAAAGAAC AAGATCTTTG CCCTCGGGGA 
GGCTGTGTGT GTGTGATAT GTATGGATGC GTGGATATCT 
GTGTATATGC CGGTATGTGC GTGCATGTGT ATATAAAGCC 
TCACATTTTA TGATTTTGA 
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FIG. 8J. Partial Sequence of Human HNF4 Gene 
(exon 9, SEQ ID NO:52) 



GGGACACATA GATGCTATAA GTAGGTCAGT TGGCTGCAGC 
AGAGATGTGG GGGATGAGGC TGAAAGGTGA GGCGGGAGCA 
AATGGTTGAA GGACTTGCAC TCCAAGGAGC TTTGAGAGCC 
ATTGATTACA TCCATTATGT TACTATGTGA CCAATACATT 
ACTCATTAGA ACATTTACGT GATCTCAGAG CTTCCTTATA 

TGCACCTTGT TCCTTTCAAC TCACTTTTGT TCTCTTGGTT 
TTTTGGGGTC CTCTTAACAC CCTCATGAAG TCTATAGATG 
GGAATGGTAC ACCCTAGTTT ACTAACCCAG GAATAGGTAC 
CCAACAGGCA CTGCCAATAT TGGATGGGCT GGTTGATTGG 
CCACGCCTGA GGAAGATGGC GTCCCAAGGC CTG AG GTCTG 

CATCCCAGAC TCTCCATCCT GATCGACCTT CTCTACCTGC 
AGGGTCCCCC AGCGATGCAC CCCATGCCCA CCACCCCCTG 
CACCCTCACC TGATGCAGGA ACATATGGGA ACCAACGTCA 
TCGTTGCCAA CACAATGCCC ACTCACCTCA GCAACGGACA 
GATGTGTGAG TGGCCCCGAC CCAGGGGACA GGCAGGTGGG 

CAAACTCTGG GATTTTACCT TGCAAAGGGT GAGGATGGGG 
CTTAAGACAG GAGGCAGGAG AAAGTGGAGT CTAGAAGGTA 
GAACCAGGAT GCAACAGTTT TCTGGGTTCC AGGGTAGGGA 
ATAAAGGGCA AGATTGTCCA TTTGTTGAGG CTGTTTATTC 
AGTAAGGTGA CTGACAGCCT TTACTGAATG AAGCCATTGT 

TGGGATGAGG CAATCCACTG GATGAGGTAA CCCATTGGGT 
GAAGATGTCT TGGGTGAGAA TTCCATTAGT TGACATTGTC 
CATTAAGTAA AAGTGGTCAT TGAAGTAAGG CTGCACAGTT 
GGGTAAGGCT ATCCATTAGA CATTAGATGA GACTACCCAT 
TGGGTCAGGA TGTCTGCTGG GCTA 
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FIG. 8K. Partial Sequence of Human HNF4 Gene 
(Exon 10 SEQ ID N0:54) 



TTTGGGAGAA GCAGTCCAAG TCTGCATATC AAATAAATGA 
TGGAGGAGAT GGGTGGTAGG ACCTTCCAGA CCTCAT AAAA 
CTTAGGCTTT ATGATCTGGG ACTCACAGAA GGTTGAGCAA 
TAAAAGACCT TAGGGATTAT CTGGCTTAAT TAATTCTCTC 
ATTTTATAGA GGAAGAAATT A AGTC A AG GT GGGGCAGGGT 

GGGAGGGGAG AACTTTCCCG GGGCTCTTCA TTTACTCCCA 
CAAAGGCTGG AATTTTGAGC AGCCCCTGTC TGTCTGTTTG 
TCCTTCCAGC CACCCCTGAG ACCCCACAGC CCTCACCGCG 
AGGTGGCTCA GGGTCTGAGC CCTATAAGCT CCTGCCGGGA 
GCCGTCGCCA CAATCGTCAA GCCCCTCTCT GCCATCCCCC 

AGCCGACCAT CACCAAGCAG GAAGTTATCT AGCAAGCCGC 
TGGGGCTTGG GGGCTCCACT GGCTCCCCCC AGCCCCCTAA 
GAGAGCACCT GGTGATCACG TGGTCACGGC AAAGGAAGAC 
GTGATGCCAG GACCAGTCCC AGAGCAGGAA TGGGAAGGAT 
GAAGGGCCCG AGAACATGGC CTAAGGCACA TCCCACTGCA 

CCCTGACGCC CTGCTCTGAT AACAAGACTT TGACTTGGGG 
AGACCCTCTA CTGCCTTGGA CAACTTTCTC ATGTTGAAGC 
CACTGCCTTC ACCTTCACCT TCATCCATGT CCAACCCCCG 
ACTTCATCCC AAAGGACAGC CGCCTGGAGA TGACTTGAGC 
CTTACTTAAA CCCAGCTCCC TTCTTCCCTA GCCTGGTGCT 

TCTCCTCTCC TAGCCCCGGT CATGGTGTCC AGACAGAGCC 
CTGTGAGGCT GGGTCCAATT GTGGCACTTG GGGCACCTTG 
CTCCTCCTTC TGCTGCTGCC CCCACCTCTG CTGCCTCCCT 
CTGCTGTCAC CTTGCTCAGC CATCCCGTCT TCTCCAACAC 
CACCTCTACA GAGGCCAAGG AGGCCTTGGA AACGATTCCC 

CCAGTCATTC TGGGAACATG TTGTAAGCAC TGACTGGGAC 
CAGGCACCAG GCAGGGTCTA GAAGGCTGTG GTGAGGGAAG 
ACGCCTTTCT CCTCCAACCC AAC 
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Box I Observations where certain claims wtrt ff und unsearchable (Continuation of itsm 1 of first shoot) 

This International Search Report has not been established in respect of osrtain claims under Article 17(2)(a) for the following reasons: 
t. | | Claims Noa.: 

because they relate to subject matter not required to be searched by this Authority, namely: 



PH Claims Noa.: 

because they relate to parts of the International Application that do not comply with the prescribed requirements to such 
an extent that no meaningful International Search can be earned out, spectftoaily: 

see FURTHER INFORMATION sheet PCT/ISA/210 



3. Q Claims Nos.; 

because they are dependent claims and are not drafted in accordance with the second and third sentences of Rule 6.4(a). 

Box II Observations where unity of invention Is lacking (Continuation of item 2 of first sheet) 

This International Searching Authority found multiple inventions in this international application, as follows : 



t I I As ail required additional search fees were timely paid by the applicant, this International Search Report oovers ail 
1 1 searchable claims. 

2- | 1 As all searchable claims could be searched without effort justifying an additional fee, this Authority did not invite payment 
of any additional fee. 

3. I I As only some of tha required additional search fees were timely paid by the applicant, this International Search Rsport 
1 1 covers only those claims for which fees were paid, specifically claims Noa.: 



4 I I No required additional search tees wens timely paid by the applicant Consequently, this IntemationaJ Search Report is 
restricted tc the invention first mentioned in the claims; it is covered by claims Noa.: 
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| | No protest ecoompanied the payment of additional search fees. 
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This international search report has not been established in respect of 
certain claims under Article 17(2) (a) for the following reasons: 

Claims Nos.: 13,14,17,18,31,66,77-89,99-101 

because they relate to parts of the international application that do not 
comply with the prescribed requirements to such an extent that no 
meaningful international search can be carried out, specifically: 

In the aforementioned claims, DNA and protein/polypeptide sequences are 
emphazised which do not correspond to the type of sequence within the 
sequence listing of the application, i.e. the applicant mentions SEQ IDs 
which should represent nucleic acid sequences, but these relate to amino 
acid sequences, and vice versa (= Obscurity). 
In addition, some of the ' 

claimed SEQ IDs relate to entities which do not correspond to the 
entities referred to in the claim (i.e. claim 18 relates to cDNA encoding 
the complete protein HNF-lbeta whereas SEQ ID N0:90 relates to a 20 bp 
oligonucleotide = Inconsistency). 
Thus, based on article 6 (PCT), an 
incomplete search was carried out. 
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